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Abstract 

A self-stabilizing protocol is one that begins to behave correctly in bounded time, no matter 
what state the protocol is started in. Self-stabilization abstracts the ability of a protocol to 
tolerate arbitrary faults that stop. We investigate the power and applicability of local checking 
and correction for the design of stabilizing network protocols. 

A link subsystem is a pair of neighboring nodes and the two links between them. Intuitively, 
a protocol P is locally checkable if whenever P is in a bad state, some link subsystem is also in 
a bad state. A protocol P is locally correctable if P can be corrected to a good state by locally 
correcting link subsystems. 

We present four general techniques for designing stabilizing protocols. We first show that 
every locally checkable and correctable protocol can be stabilized in time proportional to the 
height of an underlying partial order. Second, we show that every locally checkable protocol 
on a tree can be stabilized in time proportional to the height of the tree. Third, we show that 
every locally checkable protocol can be stabilized in time proportional to the number of network 
nodes. The third result shows that we can dispense with the need for local correct ability or the 
need for the underlying topology to be a tree as long as we are willing to pay a higher price 
in stabilization time. Fourth, we show that any deterministic synchronous protocol it can be 
converted to an asynchronous, stabilizing version of 7r. The fourth technique is useful because 
there are network tasks for which a synchronous protocol exists but for which no asynchronous, 
locally checkable solution is known. 

We also present two useful heuristics. The first heuristic, that of removing unexpected packet 
transitions, can often be used to transform a protocol into a locally checkable equivalent. A 
number of existing protocols work in a dynamic network model where links can fail and recover. 
The second heuristic states that locally checkable protocols for dynamic networks can sometimes 
be made locally correctable. The basic idea is to use the link failure and recovery actions of 
the original protocol to locally correct link subsystems. 

Together our techniques cover a broad range of networking tasks. We use our general 
techniques to construct new or improved stabilizing solutions to many specific for Mutual 
Exclusion, Network Resets, Spanning Trees, Topology Update, Min Cost Flows etc. Many 
of our solutions are practical and can be applied to real networks without appreciable loss in 
efficiency. For example, the messages required for local checking can easily be piggybacked on 
the "keep-alive" traffic sent between neighbors in real networks. 

Our techniques also help in succinctly understanding existing stabilizing protocols. We 



define a special case of local checking called one-way checking. We show that many existing 
protocols implicitly use one-way checking together with two other methods that we call counter 
flushing and timer flushing. 

In the past, papers on stabilization have avoided message passing models of communication 
because of the problems caused by unbounded storage Data links. In a stabilizing setting, 
such links can be initialized with an unbounded number of fictitious packets. Thus almost any 
non-trivial network task is impossible in a stabilizing setting in which the links have unbounded 
storage and the nodes are restricted to be finite state machines. We avoid this problem by using 
the standard asynchronous message passing model of a computer network except that each link 
is what we call a Unit Storage Data Link (UDL) that can store at most one packet. Our UDL 
model can be implemented over real physical channels. Our UDL model also generalizes easily 
to a Bounded Storage Data Link which can store a constant number of packets. 

We introduce a new definition of stabilization in terms of the external behavior of a system. 
The definition allows us to define that an automaton A stabilizes to another automaton B even 
though A and B have different state sets. The definition also allows a clean statement of a 
useful Modularity Theorem. This theorem allows us to prove that a large system is stabilizing 
by proving that each of its pieces is stabilizing. 

Keywords: Self-Stabilization, Fault-Tolerance, Network Protocols, Distributed Algorithms, 
Local Checking and Correction. 
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Chapter 1 



Introduction 



In physics we often talk about systems that stabilize to a good state after initial 
perturbations. For example, a spring eventually stabilizes after being compressed. 
More generally, systems can stabilize to good behavior after an initial perturbation, 
where a behavior is a description of how the state changes with time. For example, 
a missile with a tracking system will continue to move towards its target after it is 
momentarily thrown off course by bad weather. In these examples drawn from physics 
and control theory, the states are continuous variables and the state transitions are 
described by differential equations. 

By contrast, in this thesis we will concentrate on computer systems, and especially 
systems of computers that are interconnected by networks. In such systems, states are 
described by discrete variables and state transitions are described by transition rules, 
often in the form of programs. We will focus on the ability of such computer systems 
to stabilize to "correct behavior" after arbitrary initial perturbation. This property 
was called self- stabilization by Dijkstra [Dij74]. The "self" emphasizes the ability of 
the system to stabilize by itself without manual intervention. 



1.1 A Door Closing Protocol 

A story illustrates the basic idea. Imagine that you live in a house in Alaska in the 
middle of winter. You establish the following protocol (set of rules) for people who 
enter and leave your house. Anybody who leaves or enters the house must shut the 



door after them. If the door is initially shut, and nobody makes a mistake, then the 
door will eventually return to the closed position. Suppose, however, that the door 
is initially open or that somebody forgets to shut the door after they leave. Then 
the door will stay open until somebody passes through again. This can be a problem 
if heating bills are expensive and if several hours can go by before another person 
goes through the door. It is often a good idea to make the door closing protocol 
self- stabilizing. This can be done by adding a spring (or automatic door closer) that 
constantly restores the door to the closed position. 

We can model this situation as a state transition system. To keep things simple, 
let us assume that the door is only used to leave the house, and another door is used 
to enter. The state of the system consists of two Boolean variables, in_threshold and 
door_open. Variable inAhreshold is true if and only if a person is in the threshold 
of the door waiting to go out. Variable door_open is true if and only if the door is 
open. We use state transitions called EntER_Thr.ESHOLD, OPEN_DoOR and LEAVE 
to model the action of a person entering the threshold, opening the door, and leaving 
respectively. We will model errors by allowing the initial values of the two Boolean 
variables to be arbitrary. 

The code for these routines is described in Figure 1.1. The actions are described 
in terms of "preconditions" and "effects". Preconditions are the enabling conditions 
that must be true before an action can be taken. For instance, we don't allow the 
OPEN_DoOR action to be taken unless there is a person in the threshold. Effects are 
the results of an action. For instance, the LEAVE action shuts the door. This style of 
description is used throughout the thesis. 

The code also specifies that certain actions take place in time t if they are continu- 
ously enabled. By this we mean that if the preconditions of the action remain true for 
time t, then the action must occur within time t. We will use such timing conditions 
throughout the thesis. 

Next, we say that the door closing system is correct if whenever the door is open, 
there is somebody in the threshold. We write this more formally using the predicate 
OK -Door = (If door -open = true then inAhreshold = true). For the door closing 
system to be self-stabilizing we want OKJDoor to eventually hold regardless of what 
state the system starts in. But it is easy to see that if initially door_open = true and 
inAhreshold = false then the predicate OKJDoor may never hold. 
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The state of the system consists of two boolean variables 
inJhreshold and door_open 

Enter_Threshold (*user enters the threshold*) 
Preconditions: inJhreshold = false 
Effects: inJhreshold := true 

Open_Door (*user opens the door*) 

Preconditions: inJhreshold = true and door_open = false 
Effects: door_open := true 

Leave (*user leaves through open door*) 

Preconditions: inJhreshold = true and door_open = true 
Effects: inJhreshold := false; door_open := false 

The Open_Door and Leave actions will occur in time t if 
they are continuously enabled. 



Figure 1.1: Door Closing Syster 
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Restore_Door (*shut the door if there is no user waiting*) 
Preconditions: inJhreshold = false and door_open = true 
Effect: door-open := false 

The Restore_Door action will occur in time t if it is 
continuously enabled. 



Figure 1.2: Extra action that models automatic door closing in a Door Closing Syster 



We can model the addition of an automatic door closer using the action RESTORE_DoOR 
shown in Figure 1.2. This door closer works by detecting whether there is anybody in 
the threshold. 



1.2 Self- Stabilization using Domain Restriction 

Consider Figures 1.1 and 1.2 again. Some readers may object that we caused the prob- 
lem by allowing door_open to be an independent variable. Isn't it possible to hardwire 
relationships between variables to avoid illegal states? In fact, such a technique is 
actually used in a revolving door! 

A revolving door (see Figure 1.3) can be modelled as having three states instead of 
four. A person enters the revolving door, gets into the middle of the door, and finally 
leaves. It is physically impossible to leave the door open and yet there is a way to exit 
through the door. 

This technique, which we will call domain restriction, is a simple but powerful 
method for removing illegal states in computer systems that contain a single shared 
memory. Consider two processes A and B that have access to a common memory 
as shown in the first part of Figure 1.4. Suppose both processes should not run 
concurrently because they can interfere with each other. Thus we would like to provide 
mutual exclusion for the two processes. To achieve this we can pass a token between 
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STEP 1: ENTERING STEP 2: MIDDLE OF THE DOOR STEP 3: OUT AT LAST! 

Figure 1.3: Exiting through a revolving door. 

A and B. A process can "run" only when it has the token. 

One way to implement this is to use two boolean variables token a and tokens- The 
token variable for a process is set to true whenever the process has the token. To pass 
the token, Process A sets token^ to false and sets tokens to true. In a self- stabilizing 
setting, however, this is not a good implementation. For instance, the system will 
deadlock if token^ = tokens = false in the initial state. The problem is that we have 
some extra and useless states. The natural solution is to restrict the domain to a single 
bit called turn, such that turn = 1 when A has the token and turn = when B has 
the token. By using domain restriction, 1 we ensure that any possible state is also a 
legal state. 

In this thesis, we will sometimes use domain restriction to avoid illegal states within 
a single node of a computer network. Domain restriction can be implemented in many 
ways. The most natural way is by restricting the number of bits allocated to a set 
of variables so that every possible value assigned to the bits corresponds to a legal 
assignment of values to each of the variables in the set. Another possibility is to 
modify the code that reads variables so that only values within the specified domain 
are read. Almost all the automata described in this thesis are finite state machines. 
Domain restriction can be performed (for finite state machines) by enumerating the 
legal states and then adding suitable checks to the code. 

Unfortunately, domain restriction cannot solve all problems. Consider the same 



*In this example, we are really changing the domain. However, we prefer the term domain restriction. 
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Figure 1.4: Token passing among two processes 
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Figure 1.5: Atypical mesh Network 

two processes A and B that wish to achieve mutual exclusion. This time, however, 
(see Figure 1.4, Part 2) A and B are at two different nodes of a computer network. 
The only way they can communicate is by sending token messages to each other. Thus 
we cannot use a single turn variable that can be read by both processes. In fact, A 
must have at least two states: a state in which A has the token, and a state in which 
A does not have the token. B must also have two such states. Thus we need at least 
four combined states, of which two are illegal. 

Thus domain restriction at each node cannot prevent illegal combinations across 
nodes. We need other techniques to detect and correct illegal states of a network. It 
should be no surprise that the title of Dijkstra's pioneering paper on self-stabilization 
[Dij74] was "Self-Stabilization in spite of Distributed Control." 

1.3 Self- Stabilization in Computer Networks 

In this thesis, we will explore self-stabilization properties for computer networks. A 
computer network consists of nodes that are interconnected by communication chan- 
nels. The network topology (see Figure 1.5) is described by a graph. The vertices of 
the graph represent the nodes and the edges represent the channels. Nodes commu- 
nicate with their neighbors by sending messages along channels. Many real networks 
such as the ARPANET, DECNET and SNA can be modelled in this way. 

A network protocol consists of a program for each network node. Each program 
consists of code and inputs as well as local state. The global state of the network 
consists of the local state of each node as well as the messages on network links. We 
define a catastrophic fault as a fault that arbitrarily corrupts the global network state, 
but not the program code or the inputs from outside the network. 
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Self-stabilization formalizes the following intuitive goal for networks: despite a his- 
tory of catastrophic failures, once catastrophic failures stop, the system should stabilize 
to correct behavior without manual intervention. Thus self-stabilization is an abstrac- 
tion of a strong fault-tolerance property for networks. It is an important property of 
real networks because: 

• Catastrophic faults occur: Most network protocols are resilient to common 
failures such as nodes and link crashes. However, many protocols cannot deal 
with memory corruption. But memory corruption does happen from time to 
time. For example, alpha particles are a common cause of memory corruption. 
It is also hard to prevent a malfunctioning device from sending out an incorrect 
message that carries erroneous state information. The malfunctioning node can 
then crash leaving an incorrect message on a channel. 

• Manual intervention has a high cost: In a large decentralized network, 
restoring the network manually after a failure requires considerable coordination. 
As in the case of the AT&T network, the consequent network shutdown has a 
large dollar cost. Thus even if catastrophic faults occur rarely, (say once a year) 
there is considerable incentive to make network protocols self- stabilizing. In fact, 
a reasonable guideline is what we call Lauck's Principle [Lau90]. This principle 
states that the network should stabilize preferably before the user notices and 
at least before the user logs a service call. This may seem facetious. However, 
service calls are so expensive that this guideline is sometimes used to set timers 
for self- stabilizing protocols! 

These issues are illustrated by the crash of the original ARPANET protocol ([Ros81] 
[Per83]). The designers used a sequence number to distinguish newer topology updates 
from older ones. Because the set of sequence numbers was finite, they used a circularly 
ordered number space. Hence, it was possible to have three sequence numbers a, 6, c 
such that a > b > c > a. The protocol was carefully designed never to enter a state that 
contained the three sequence numbers a, 6, and c. Unfortunately, a malfunctioning 
node injected three such updates into the network and crashed. After this the network 
cycled continuously between the three updates. It took days of detective work [Ros81] 
before the problem was diagnosed. With hindsight, the problem could have been 
avoided by making the protocol self- stabilizing. 
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Self-stabilization is also attractive because a self- stabilizing program does not re- 
quire initialization. The concept of an initial state makes perfect sense for a single 
sequential program. However, for a distributed program an initial state seems to be 
an artificial concept. How was the distributed program placed in such an initial state? 
Did this require another distributed program? Self-stabilization avoids these questions 
by eliminating the need for distributed initialization. 

Probably the most exciting reason for self-stabilization is that can provide a uni- 
form approach towards fault- tolerance, thus leading to simplification as well as strength- 
ening of existing fault-tolerant protocols. This is because self-stabilization can subsume 
such common fault models as link and node failures. However, in order to so the self- 
stabilization recovery mechanisms must be fast enough to provide adequate response 
time for such common failures. 

There appears to be a hierarchy of faults ranging from very rare faults like memory 
corruption (that occur at most once every few days) to fairly common faults like link 
and node crashes (that may occur in the order of minutes) to very common faults like 
bit errors (that may occur every second). Thus it is adequate to recover from memory 
errors in the order of minutes, from link failures in the order of seconds, and from 
bit errors in the order of milliseconds. If the self-stabilization mechanism recovers too 
slowly (say in the order of minutes, as in [Per83]), then it is necessary to have separate, 
faster mechanisms to deal with common failures ([Per83]) like link and node failures. 

On the other hand, if the self- stabilizing mechanisms recover in the order of seconds, 
then there is no need for separate mechanisms to deal with link and node failures. A 
good example of an existing self- stabilizing protocol that meets this criteria is the 
IEEE 802.1 bridge spanning tree protocol described in [Per85]. Clearly reducing the 
number of separate mechanisms leads to simpler protocols. It should be noted that it is 
unlikely that self-stabilization will be efficient enough to subsume the need for all other 
fault-recovery mechanisms; for example, most self- stabilizing protocols will probably 
need some retransmission scheme to deal with messages lost due to bit errors. However, 
the more mechanisms that can be subsumed, the simpler the resulting protocol. 

In this thesis we will investigate methods for designing stabilizing protocols that 
have fast recovery times. Such protocols are not just faster and more fault-tolerant but 
also (by the arguments in the last paragraph) may be simpler than existing protocols. 
The example in Section 1.6.1 should clarify this point. 
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1.4 Criticisms of Self- Stabilization 

Despite the claims of the previous section, there are several peculiar features of the 
self-stabilization model that are often criticized. 

• The model allows network state to be corrupted but not program code. Isn't 
this distinction artificial? After all programs and state variables are stored as 
bits in memory. 



• 



The model only deals with catastrophic faults that stop. There are other (e.g., 
Byzantine) models that deal with continuous faults. Aren't models that allow 
continuous faults preferable? 

• A self- stabilizing program P is only supposed to eventually produce correct be- 
havior. In the interim period, P is allowed to make mistakes. How can we make 
use of a program that can sometimes make mistakes? 

• Most self- stabilizing network protocols require periodic message traffic. Using 
some theoretical measures of message complexity, the message complexity of a 
self- stabilizing protocol is unbounded. Thus a theoretician may question whether 
such protocols are worth the "cost". 

We deal with each criticism in turn: 

1.4.1 Distinction between Program Code and State 

Program code can be protected against arbitrary corruption of memory by redundancy 
since code is rarely modified. Some static input (such as node IDs) can also be pro- 
tected in this way and can be considered to be part of the program. Some changing 
input (such as the list of neighboring nodes in a network) can be protected by requiring 
that such input be the output of another self- stabilizing protocol. On the other hand, 
the state of a program is constantly being updated and it is not clear how one can 
prevent illegal operations on the memory by using checksums. It is even harder to 
prevent a malfunctioning node from sending out incorrect messages. 
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1.4.2 Faults that Stop versus Faults that Continue 

In the Byzantine [LSP82] fault model, some fraction of faulty nodes can continuously 
exhibit arbitrary faulty behavior. By allowing continuous faults, the Byzantine fault 
model appears to be stronger than the self-stabilization model. However, network 
protocols with Byzantine robustness [Per88] are expensive because they require large 
amounts of redundancy in storage, processing, and message traffic. On the other hand, 
it is possible to make many protocols self- stabilizing with a small cost in extra message 
traffic and node processing. 

In Byzantine models, only a fraction of nodes are allowed to exhibit arbitrary 
behavior. In the self-stabilization model, all nodes are permitted to start with arbitrary 
initial states. Thus, neither model subsumes the other. In theory, there is no reason 
why a protocol cannot be robust against both Byzantine failures and arbitrary initial 
states. 

Assuming that faults stop in the self-stabilization model is only a modelling arti- 
fice. In practice, we only need faults to stop for a period long enough for the protocol 
to stabilize. Thus the self-stabilization model is especially appropriate for handling 
transient errors. 

1.4.3 Permitting Initial Errors 

A distributed database program cannot tolerate errors that may, for instance, wrongly 
credit an account with a million dollars! However, for most of the network protocols 
considered in this thesis, errors are not as critical. An example is a network protocol 
that computes routes between nodes. The nice thing about a routing protocol is that 
even if the network is completely fouled up, the worst thing that can happen is that 
network traffic stops for a while. Most of the stabilizing protocols described in this 
thesis are used for routing, scheduling, and resource allocation tasks. For such tasks, 
initial errors only result in a temporary loss of service. 

1.4.4 Periodic Message Sending in the Self- stabilization Model 

It is easy to show that any non-trivial self- stabilizing network protocol must send mes- 
sages periodically. Periodic sending of messages may seem extremely ugly. However, 
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in real networks, each node periodically sends control messages to its neighbors to de- 
tect whether the neighbor is alive. For many of the protocols described in this thesis, 
the periodic message sending required for stabilization can be piggybacked on such 
"keep-alive" message traffic without appreciable loss of efficiency. 

In a real implementation, periodic message sending is controlled by timers in order 
to keep the overhead bounded. We will not model these timers explicitly. A timer 
can be implemented in a self- stabilizing fashion as long as the hardware clock in every 
node that's up continues to function. For instance, we can implement a timer using a 
counter that is incremented every time the hardware clock ticks. When the counter 
reaches its maximum value, the sending of a message is enabled, and the counter is 
reset to 0. Assuming that the hardware clock continues to tick is not at all restrictive. 
For most computers, if the hardware clock stops, the node has effectively crashed! 

1.5 Brief History of Self- Stabilization 

Self-stabilization was introduced by Dijkstra in a seminal paper [Dij74]. In Dijkstra's 
model, a network protocol is modelled using a graph. The nodes of the graph contain 
finite state machines. The protocol is asynchronous, and the asynchrony is modelled 
by an adversarial scheduler called a "demon". At each stage, the demon is allowed to 
choose an arbitrary node in the graph to make a move. In a single move, a node is 
allowed to read the state of its neighbors, compute, and then possibly change its state. 
In this setting, Dijkstra described three self- stabilizing mutual exclusion protocols. 

After Dijkstra's initial paper, work on self-stabilization languished for many years. 
However, in this period, at least three researchers recognized the importance of the con- 
cept, and championed its cause. Gouda and his co-workers at the University of Texas 
produced a number of papers (e.g., [BGW87],[GM90], [AG90]) in this area. Lam- 
port's PODC address ([Lam84]) was probably responsible for awakening the interest 
of the theory community in self-stabilization. Independently, Tony Lauck, [Lau90], 
who is responsible for the architecture of DECNET, recognized the applicability of 
self-stabilization to real networks. At his insistence, self-stabilization was added as a 
requirement ([Per83]) for many DECNET protocols. 

After Lamport's PODC address, a number of papers began to appear in this area. 
The contributions of these papers fall into three categories: refinements of Dijkstra's 
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model, solutions to specific tasks, and one general technique. 

1.5.1 Refinements of Dijkstra's model 

In Dijkstra's model, a node is allowed to read the state of all its neighbors and change 
its own state, all in one move. This level of atomicity is hard to achieve in a real 
network. [BGW87] suggest a model in which at each step, the demon can allow 
an arbitrary subset of nodes to make a move. Later [DIM90] introduced a model 
in which a node communicates with its neighbors by reading or writing to certain 
shared registers. Also, in their model, reading and writing of the shared register 
(and local computation) are separate atomic steps that can be arbitrarily interleaved. 
Other papers [Per83, AB89, GM90, KP90] model communication between nodes by 
the explicit sending of messages. 

In Dijkstra's model, one node in the graph is assumed to be a "leader" in order to 
break symmetry. Dijkstra observed that some form of symmetry breaking is required 
for self- stabilizing mutual exclusion. Later models introduced other forms of symmetry 
breaking. In [Per83, AKY90], each node is assumed to have a distinct ID. [IJ90] 
introduced the use of randomization. Finally, most papers in this area assume the 
model is completely asynchronous. No assumptions are made about how long it takes 
for actions to be performed. By contrast, [Per83, Per85] assume upper bounds on 
message delivery and node processing times. 

1.5.2 Existing solutions for Specific Tasks 

Dijkstra's paper concentrated on the task of mutual exclusion on rings. Subsequent 
papers (e.g., [BP89, DIM90, IJ90]) continued to work on self- stabilizing mutual exclu- 
sion, but in different models. Solutions to other tasks have also appeared. 

[Per83, SG89] describe self- stabilizing routing protocols to compute shortest path 
routes between every pair of nodes in a network. [Per85, AG90, AKY90] describe 
self- stabilizing protocols to compute a spanning tree in a network. [AB89, GM90] 
show how to establish reliable communication between a pair of nodes over a physical 
channel. A reset protocol is a protocol that can be used to "reset" a network to a 
prespecified initial state; a snapshot protocol can be used to find a consistent global 
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state of a network. [AG90, KP90] describe self- stabilizing snapshot and reset protocols. 
[Spi88a] describes a self- stabilizing virtual circuit protocol. 

1.5.3 Existing General Technique 

While many fundamental problems have been tackled, there is a lack of general meth- 
ods. We know of only one general technique other than the work described in this 
thesis. Katz and Perry [KP90] show how to stabilize a large class of distributed algo- 
rithms by centralized checking and correction at a leader. The main technical difficulty 
in this approach is finding a self- stabilizing method to do checking and correction. For 
this purpose, [KP90] invented a self- stabilizing snapshot protocol. However, the need 
for centralized checking makes the performance of this approach rather poor as we see 
in the next section. 



1.6 Local Checking and Correction: A Preview 

The major theme of this thesis is the design of new and efficient general methods 
for making protocols self- stabilizing. All our methods are based on what we call 
local checking. Unlike [KP90], our methods are efficient because checking is local and 
decentralized. In this section, we give a preview of our ideas. 

1.6.1 Example of Checking and Correcting on a Single Link 
Subsystem 

To make the notion of local checking and correcting more concrete we quickly describe 
an example of a protocol that works between two nodes, a sender node and a receiver 
node (Figure 1.6) that are connected by two unidirectional links. The sender sends 
messages to the receiver who buffers the messages in a finite sized queue. Any message 
that arrives when the queue is full is dropped. A simple credit-based scheme can be 
used to prevent messages being dropped during normal operation. 

The sender (Figure 1.6) keeps a credit register which stores the current credits 
available to the sender. Initially, assume that the receiver queue is empty and that the 
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Figure 1.6: Credit Based Flow Control between a Sender and a Receiver 

sender's credit register is equal to the size of the receiver queue (say Max). The sender 
only sends a message when the credit register is non-zero; after sending a message, the 
sender decrements its credit register by one. When the receiver removes a message 
from its queue, the receiver sends a CREDIT message back to the sender. When the 
sender receives such a CREDIT message, the sender increments its credit register by 
one. 

It is easy to see that after proper initialization and assuming that no errors occur, 
no messages will be dropped. Under such conditions, the following condition (which 
we will later call a Local Predicate) holds at every instant. If at any instant we denote 
(see Figure 1.6) the value of the credit register by C, the number of messages in flight 
by M, the number of messages in the queue as Q and the number of credits in flight 
by CR, then it must be true that: C + M + Q + CR = Max. Intuitively, this can be 
seen by analogy to two banks that only transfer money between each other; assuming 
no errors, the total amount of "money" (credits plus messages) in and between the 
two banks must be conserved. This local predicate ensures that there is always room 
in the queue for the messages in flight since M + Q < Max. 

Unfortunately, this simple credit based scheme runs into trouble if the system is 
either improperly initialized or there are errors on the link. Link errors can result 
in lost or even (less likely) added credits. Credit loss can result in slowing down the 
sender and possibly even deadlock; credit addition can lead to continuous dropping of 
messages. 

We can make this protocol self- stabilizing by superimposing a periodic check- 
ing/correcting process (see Figure 1.7) on the original protocol. This process is trig- 
gered by a timer at the sender every few seconds. To initiate a checking phase, the 
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Figure 1.7: A Single Phase of Checking/Correction using Snapshots/Resets 

sender (see Figure 1.7) sends a Snapshot ([CL85]) request message to the receiver. 
While checking, the sender also stops sending any messages on the link. When the 
receiver receives such a message, the receiver sends back a response containing the 
number of messages in its queue (Q) at the instant it sent the response. When the 
sender gets the response, the sender checks whether Q + C = Max, where C is the 
value of the credit register at the instant the response is received. If this condition is 
false, the sender infers that the local predicate is violated and initiates a reset phase. 

To initiate a reset phase, the sender (see Figure 1.7) sends a Reset request message 
to the receiver. As in checking, the sender also stops sending any messages on the 
link until it gets a response. 2 When the receiver receives such a request, the receiver 
empties its queue and sends back a response. When the sender gets the response, the 
sender reinitializes its credit register to Max. In other words, the receiver reinitializes 
its local state on sending the response and the sender reinitializes its local state on 
receiving the response. Its not hard to see that if no errors occur during the reset 
phase, the local predicate will hold at the end of the reset phase. 

Several optimizations can be added to this basic scheme. For example, it is possible 
to avoid having the sender stop sending messages during a checking phase by keeping 
track of some extra state variables. For this particular example, it is also possible to 
avoid a separate reset phase; instead when the sender receives a snapshot response, 
the sender can locally correct the credit register to account for any discrepancy. 



2 Chapter 5 contains details of how this protocol deals with lost request and responses. 
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We can now illustrate the point made earlier that self-stabilization can subsume 
the need for other fault-tolerance mechanisms. If the checking/correction procedure 
is activated fairly often (doing it once every second on a high speed link requires 
negligible overhead), then there is no need for separate mechanisms when either the 
sender or receiver nodes or the two links crash and recover. For a crash and recovery 
we do nothing special. Clearly the local predicate can be violated by such actions. 
However, after the next checking and correction phase the flow control scheme will 
begin working correctly. In the interim, messages may be lost but this is comparable 
to the time most protocols take to reinitialize after a link recovers; during this period 
the protocol is not providing service to the user. Thus the final scheme is both simple 
and fault-tolerant. 

This fault-tolerant credit based scheme was proposed by us (for use on high speed 
links) at Digital Equipment Corporation ([CSV89]). Recently, we proposed a variant 
of this scheme for hop-by-hop flow control on ATM 3 at the ATM Forum meeting in 
Aug 1993. Local checking and correction is practical! 

1.6.2 Extending the Idea to a General Network 

Briefly, the rest of this thesis can be described as an extension (of the simple link 
checking and correction scheme described in the last subsection) to general network 
protocols. 

So consider a network as shown in Figure 1.5. Recall that in the method of [KP90] 
there is a leader that periodically checks the network. If the leader discovers that the 
network is in an illegal state, the leader corrects the network by resetting it into a 
good state. Intuitively, centralized checking and correction is slow. It also has high 
message complexity. 

Instead, we divide the network into a number of overlapping link subsystems as 
shown in Figure 1.8. A link subsystem consists of a pair of neighboring nodes and the 
channels between them. We wish to replace the global, centralized checking of [KP90] 
with local, decentralized checking. The intent, of course, is to allow each link subsystem 
to be checked in parallel. This results in faster stabilization. 



3 ATM stands for Asynchronous Transfer Mode. In ATM, messages are fixed sized "cells". There can 
also be multiple "circuits" per link each of which must be independently flow-controlled and checked. 
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Figure 1.8: An Example of Two Overlapping Link Subsystems in a Network 

We describe sufficient conditions under which these methods can be applied. In- 
tuitively, a network protocol is locally checkable if whenever the protocol is in a bad 
state, some link subsystem is also in a bad state. Thus if the protocol is in a bad 
state, some link subsystem will be able to detect this fact locally. As in [KP90], we 
can correct a locally checkable protocol by doing (what we call) global correction of the 
network. However, in some cases we can do even better if the protocol is also locally 
correctable. 

Intuitively, a network protocol is locally correctable if the network can be corrected 
to a good state by each link subsystem independently correcting itself to a good state. 
Clearly, this is non-trivial because link subsystems overlap (see Figure 1.8) at nodes. 
In the figure, the correction of link subsystem S\ may cause subsystem S2 to become 
incorrect. 

1.6.3 Examples of Local Checking and Correction 

We will go through three simple examples to make these notions clearer. The first 
example is not locally checkable, the second is locally checkable but does not appear 
to be locally correctable, and the third is both locally checkable and locally correctable. 

For the first example, consider a token passing protocol in a line graph as shown 
in Figure 1.9. The line is oriented such that A is at the leftmost end and X is at 
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Figure 1.9: Token Passing in an Oriented Line Graph is not Locally Checkable 
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Figure 1.10: Coloring a Cycle is Locally Checkable but not Locally Correctable 

the rightmost end. In normal operation, a single token is passed from left to right 
(i.e.. from A to X), and then back from right to left (i.e., from X back to A). Thus 
each node in the line will receive the token periodically. This protocol is not locally 
checkable if the graph has at least 3 nodes. Consider a typical link subsystem, for 
example the subsystem between neighbors B and C in Figure 1.9. Clearly, in normal 
operation it is possible for there to be no token at either B, C, or the channels between 
them. Thus having no tokens in a link subsystem is a legal state of a link subsystem. 
But this means that if there is no token in the entire network, no subsystem can detect 
this fact locally. Remember that subsystem checking is not coordinated. 

For the second example, consider a protocol that colors the nodes of a cycle as 
shown in Figure 1.10. We require that the color of each node be either red, blue 
or green. We also require that the color of each node be different from that of its 
neighbors. Assume that in one atomic step a node can read the state of its neighbors 
and change its own state. However, steps of nodes can be arbitrarily interleaved. 

Then the protocol is locally checkable. Suppose that node A has the same color as 
a neighbor B. Then, this can be detected within the link subsystem containing A and 
B. 

However, it is not clear how to make this protocol locally correctable. Suppose 
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that all nodes are initially red. Then by the symmetry of the initial state, it appears 
that local corrections (or any corrections) are insufficient to correct the system to a 
good state. We could break symmetry with randomization. However, in this thesis we 
will only consider deterministic local correction procedures. 

Consider the same problem of coloring the nodes of a graph, except that the graph 
is an oriented line graph as shown in Figure 1.9. Then the protocol is locally checkable 
and locally correctable. Suppose that that two nodes in a link subsystem (say B and 
C) have the same color. Then to correct the link subsystem, the color of the right node 
(i.e., C) is changed to any legal color different from the color of left node (i.e., B). 
Assume that correction actions occur in bounded time after they are enabled. Then 
within bounded time, node B will have a color different from that of A and will never 
change its color from this point on. Then within bounded time after node 5's color 
stabilizes, node C will have a color different from that of B and will never change 
its color from this point on. By induction, we can show that all nodes are colored 
correctly in bounded time. 

1.6.4 Why Local Checking is Useful 

It is perhaps surprising that a number of useful network protocols are both locally 
checkable and locally correctable. In subsequent chapters we will describe locally 
checkable and correctable protocols for mutual exclusion, network resets, and end- 
to-end communication across an unreliable network. It may seem from the simple 
examples that our method is confined to acyclic graphs; this is not true: both the end- 
to-end and reset protocols work on arbitrary topologies. It also appears (see Chapter 
7) that other existing protocols that work in dynamic networks (in which the topology 
can change due to link failures and recovery) are locally correctable. Protocols that 
are both locally checkable and correctable can be stabilized very quickly. 

Protocols that are locally checkable but work on a tree topology can be stabilized 
in time proportional to the height of the tree. Thus we can remove the need for 
local correctability if the underlying topology is a tree. Another way to remove the 
need for local correctability (without restricting the topology to a tree) is to pay 
a price in stabilization time. Protocols that are locally checkable but not locally 
correctable can be made self- stabilizing by doing global correction using the network 
reset protocol developed in this thesis. The price for using global correction is that 
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the stabilization time now becomes proportional to the number of nodes. We describe 
stabilizing spanning tree and topology update protocols that use local checking and 
global correction. 

We will also describe two compilers that can compile any deterministic synchronous 
protocol 7r into a self- stabilizing asynchronous version of n. The first compiler is 
stabilized by using local checking and correction, while the second compiler is stabilized 
using local checking and global correction. The significance of the compilers is that 
there are some network tasks (for example, computing a minimal spanning tree) for 
which a synchronous protocol exists but for which no locally checkable solution is 
known. Hence the compilers extend the range of our general techniques. 

Thus while local checking cannot be used to solve every problem, there are a large 
number of useful protocols that can be efficiently stabilized using this notion. There 
are several benefits to this approach: 

• The resulting protocols are efficient and stabilize quickly. 

• The approach allows us to understand how to design self- stabilizing protocols 
in a systematic fashion. In fact, we will show that some existing self- stabilizing 
protocols can easily be understood in this framework. 

• The approach allows us to prove self-stabilization properties of protocols in a 
modular way. This is because we limit ourselves to proving properties of link 
subsystems instead of arguing about global states. 

• As a side benefit, local checking provides a useful debugging tool. Recall that 
each link subsystem periodically checks whether the subsystem is in a good 
state. Thus any violations can be logged for further examination. In a trial 
implementation of our reset procedure on the Autonet [MAM + 90], local checking 
discovered bugs in the protocol code. In the same vein, local checking can provide 
a record of catastrophic, transient faults that are otherwise hard to detect. 

1.7 Thesis Organization 

The thesis is organized into three major parts as illustrated in Figure 1.11. The first 
part consists of three chapters on basic definitions and examples. The second part 
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Figure 1.11: Thesis Organization 

contains three chapters on local checking and local correction. The third part consists 
of two chapters on local checking and global correction. The final chapter presents our 
conclusions and contains a list of open questions. There are also several appendices. 
The first appendix is a list of frequently used notation. Next, there are appendices 
containing some details of proofs that were omitted in the main text for clarity. Finally, 
there is an index of commonly used terms and definitions. Figure 1.11 also summarizes 
the major results of the thesis. 

We now describe each of the major parts in more detail below. 

1.7.1 Basic Definitions and Examples: Chapters 2-4 

Chapter 2 describes our model of computation, a variant of the timed Input/Output 
automata model ([MMT91], [LT89]. The model is basically a state machine model 
except that transitions are labelled with action names. By separating actions into 
internal and external actions, it is possible to define the correctness of an automaton 
in terms of its external behavior, where a behavior is a sequence of external actions. 
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Chapter 3 contains our definitions of stabilization. Roughly, we say that an au- 
tomaton A stabilizes to some target set P of behaviors if every behavior of A has a 
suffix that is in P. The intuition is that the behaviors of A eventually begin to "look 
like" the behaviors in P. The actual definitions are slightly more complex in order 
to define what it means to stabilize in bounded time. Our behavior definitions are 
in contrast to previous definitions (e.g., [KP90] which are in terms of the states and 
executions of a system. We do have a definition of stabilization that corresponds to the 
standard definition; we use the behavior definition for specification and the standard 
definition for proofs. 

Chapter 3 also contains our first important result. This is a Modularity Theorem 
that allows us to prove facts about the stabilization of a large system by proving facts 
about the stabilization of the system components. The theorem formalizes a "building 
block" approach to designing stabilizing protocols that we use throughout the thesis. 
Chapter 3 also describes a technique for proving stabilization properties. Chapter 3 is 
joint work with Nancy Lynch. 

Chapter 4 contains a quick example of local checking and correction in the simplified 
shared memory model introduced by Dijkstra in [Dij74]. Because Dijkstra's model is 
so simple, it allows us to strip away extraneous detail and focus on the main ideas 
behind local checking and correction. However, readers who wish to concentrate on 
results for more realistic network models should skip Chapter 4 and go directly to 
Chapter 5. From Chapter 5 onwards, we use a network model suitable for modelling 
real networks. Chapter 4 is based on work done by the author. Independently, Anish 
Arora and Mohamed Gouda from the University of Texas at Austin have obtained 
similar results. Joint publication is planned. 

1.7.2 Local Checking and Correction: Chapters 5-7 

Chapter 5 begins by introducing our network model. Our network model is basically 
the standard asynchronous message passing model except for one important twist: 
each link is restricted to store at most one packet at a time. We argue that bounded 
storage link models are essential in a stabilizing context. We also argue our network 
model can be easily implemented in real networks. The network model of Chapter 5 
is used for the reset of the thesis. 
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The concept of local checking and correction was introduced in [APV91b], and 
is joint work with Baruch Awerbuch and Boaz Patt. While these concepts are used 
throughout the thesis, [APV91b] did not present a formal description of the method. 
Instead [APV91b] described the method informally and showed how it could be used 
to stabilize two important protocols, one for end-to-end message delivery and one for 
network reset. 

Chapter 5 gives a formal basis for the method of local checking and correction in 
message passing systems. The chapter contains formal definitions of local checkability 
and local correctability in a network model. These definitions are used to state the 
main result of the chapter, the Local Correction Theorem. This theorem shows that 
any locally checkable and correctable protocol can be transformed into an equivalent 
stabilizing protocol. The stabilization time of the resulting system is proportional to 
the height of a certain partial order that is used in the definition of local correctability. 
Chapter 5 is joint work with Nancy Lynch. 

Chapter 6 applies the method of local checking to a simple mutual exclusion proto- 
col. Chapter 6 also contains an important result, the Tree Correction Theorem. This 
theorem states that any locally checkable protocol on a tree can be efficiently stabilized 
in time proportional to the height of the tree. In other words, if the underlying topol- 
ogy is a tree we can dispense with the need for local correctability. The proof of this 
theorem is only sketched because we prove a corresponding tree correction theorem 
for shared memory systems in Chapter 4. 

Chapter 7 links the second and third parts of the thesis by describing a stabilizing 
network reset protocol. Intuitively, a network reset protocol is a protocol that can 
be used by some other protocol P in order to restore P to a good state. Protocol 
P is given interfaces to make reset requests; the network reset protocol responds by 
providing reset signals at each network node. If each node (that implements P) locally 
initializes its state at the instant it receives a signal, then P will be restored to a good 
state. 

In order to use such a network reset protocol as a tool for building other stabilizing 
protocols (as we do in the third part of the thesis) the network reset must itself be 
stabilizing. Chapter 7 applies the method of local checking and correction to create a 
stabilizing network reset protocol as described in [APV91b]. Chapter 7 is joint work 
with Baruch Awerbuch and Boaz Patt. 
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Chapter 7 also explores an interesting heuristic connection between locally cor- 
rectable protocols and protocols that work in dynamic networks where links can fail 
and recover. The heuristic states that locally checkable protocols for dynamic networks 
can sometimes be made locally correctable. The basic idea is to use the link failure 
and recovery actions of the original protocol to locally correct link subsystems. This 
heuristic is the key to the proof of local correctability for our network reset protocol. 

1.7.3 Local Checking and Global Correction: Chapters 8-9 

The last part of the thesis contains two applications of global correction. 

In Chapter 8 we prove another major result, the Global Correction theorem. This 
theorem states that any locally checkable protocol can be stabilized in time propor- 
tional to the number of network nodes. The Global Correction theorem shows that 
we can dispense with the need for local correctability and the need for the underlying 
topology to be a tree as long as we are willing to pay a higher price in stabilization 
time. The height of the underlying partial order in the local correction method and the 
height of the tree in in the tree correction method are typically smaller than the num- 
ber of network nodes. Thus it pays to use local correction or tree correction wherever 
possible. 

We present stabilizing protocols for computing a spanning tree and solving the 
topology update problem as examples of Global Correction. The spanning tree and 
topology update protocols are based on joint work with Baruch Awerbuch and Boaz 
Patt that is also described in [APV91a]. The protocols in Chapter 7 and 8 are both 
efficient and practical and can be applied to real networks. 

Chapter 9 develops two compilers that can convert any synchronous protocol n into 
a self- stabilizing asynchronous version of n. The main compiler, the Resynchronizer, 
works by first applying the synchronizer protocol of [Awe85] to create an asynchronous 
version of n. Next we use global correction to make the resulting protocol stabilizing. 
This can be done by using a stabilizing reset protocol to periodically restart an asyn- 
chronous version of protocol n. The proof of the current version of the Resynchronizer 
protocol is incomplete. However, we have a proof of a much more complicated version 
of the Resynchronizer protocol that was originally reported in [AV91]. The construc- 
tion in this thesis is much simpler than our original construction in [AV91] but its 
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proof is incomplete. Thus chapter 9 is best regarded as a set of useful ideas that need 
polishing. Chapter 9 is joint work with Baruch Awerbuch 



1.8 Reading the Thesis 

Most chapters and long sections contain a roadmap at the start that explains the 
organization of the chapter or section. Similarly most chapters end with a summary 
of the important ideas in the chapter. Since it is easy to forget a piece of notation or 
a definition, the reader may also wish to consult the notation appendix and the index. 

Whenever the proof of a theorem or lemma is too long, we give an intuitive expla- 
nation of why the theorem or lemma works in the main text, and provide more details 
later or in the appendix. On a first reading, the reader is advised to skip the details. 

We believe that self-stabilization is useful and practical and hope that systems 
readers can also read this thesis. Chapter 2 is written to help readers unfamiliar 
with formal methods to get comfortable with the formalism we use. A systems reader 
wishing to get a quick summary of the results can read the introduction, summary and 
main theorems in each chapter. Once the reader gets comfortable with our method of 
describing protocols it should also be easy to read the actual code of the protocols. The 
complicated (and important) pieces of code are heavily commented and are preceded 
by informal descriptions. 
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Chapter 2 

The I/O Automaton Model 



A formal description of an algorithm is a precise and unambiguous description of the 
algorithm. Formal descriptions of sequential algorithms have proved to be useful. 
Distributed algorithms are more complicated than sequential algorithms because they 
have to deal with parallelism, asynchrony, and fault- tolerance. Thus we will often give 
a formal presentation of the protocols in this thesis. 

By describing algorithms formally, we hope to describe them precisely so as to 
avoid ambiguity and to permit careful proofs of correctness. However, it is often hard 
to do so and yet convey the important ideas. We will try to combat this by providing 
intuitive explanations along with formal descriptions. 

To describe algorithms precisely, we use an underlying mathematical model. The 
idea is that after we model a real-life distributed algorithm, we can study the algorithm 
purely in terms of its mathematical properties. Despite this, we will often return to 
what these mathematical symbols represent. 

In this chapter we will describe the computation model used to describe the dis- 
tributed algorithms in this thesis. The first section of this chapter is an intuitive 
introduction to the I/O automaton model. This section is written for readers unfa- 
miliar with the I/O automaton model. The second section of the chapter contains a 
formal description of the variant of the I/O automaton model that we use in the rest 
of the thesis. Readers already familiar with the I/O automaton model may wish to 
only read the formal description in Section 2.2. 
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2.1 The I/O Automaton Model 

Our model of computation is a variant of the timed Input/Output automaton model 
[MMT], which in turn is based on the Input/Output automaton model of Lynch and 
Tuttle [LT89]. We will omit the word "timed" in what follows. For instance, we will 
refer to a timed I/O automaton simply as an automaton. 

2.1.1 Why use the I/O Automaton Model? 

We wish to model systems of processes that compute but also communicate with 
other processes. The processes do not have a common clock and so communication is 
asynchronous. A sequential algorithm computes some function of its input and then 
halts. By contrast, our processes can continuously receive input from and react to 
their environment. 

Consider the example of a token passing ring. Such rings are the basis of a number 
of local area networks that interconnect computers in offices and on college campuses. 
A token passing ring consists of a number of processes, say to n — 1, connected 
together in a ring. To prevent more than one process from transmitting at the same 
time, a token packet is passed from process to process. A process can transmit only 
when it has a token. A process passes its token to its clockwise neighbor a bounded 
time after it finishes transmitting. It is easy to see that the system works correctly if 
there is exactly one token in the system initially. 

The simplest model of the token passing system is a big state machine. Each node is 
a state machine that sends and receives packets and so are the channels between nodes. 
The state of a channel is the sequence of packets stored in the channel. Unfortunately, 
such monolithic models often do not have a property we call compositionality. A model 
is said to be compositional if we can infer the behavior of the system from the behavior 
of its components. This allows modular specification and modular proofs. 

The I/O automaton model is essentially a state machine model. However, it has a 
few extra features that make it a compositional model. 
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INTERFACE TO A PROCESS INTERFACE TO A CHANNEL 

Figure 2.1: The Interface to Process i and the Channel between Process i and Process i+ 1 in a token 
passing system 

2.1.2 Four Important Features of the I/O Automaton Model 

In the I/O automaton model, both systems and processes are modelled by an I/O 
automaton. An I/O automaton is an automaton (i.e., a state machine) with the 
following additional features. 

First, all state transitions of the automaton are labelled with names that are known 
as actions. Further, all actions are classified into three categories: input, output, 
and internal. Intuitively, input actions are actions caused by the external world or 
environment and to which the automaton must respond. Output actions are actions 
caused by the automaton and to which the environment must respond. Finally, internal 
actions are transitions that are neither input or output actions: such actions only 
change the internal state of the automaton. 

As an example, Figure 2.1 shows a single process in the token passing system, say 
Process i. Of course, Process i must have interfaces to send and receive packets. We 
can model these interfaces using an input action R,ECEIVE;(p) to receive a packet p 
and an output action SEND;(p) to send a packet p. In the figure, we have not shown 
any internal actions. Figure 2.1 also shows the external interface to a channel between 
Process i and Process i + 1. Notice that SEND;(p) is an input action and RECEIVE;_|_i(p) 
is an output action for this channel. Intuitively, this corresponds to the fact that when 
Process i sends a packet as an output action, that packet must simultaneously be 
stored in the channel using a channel input action. 

The classification of actions allows a simple scheme for "plugging" together au- 
tomata so that they can can communicate. The formal name for this scheme is compo- 
sition. Composition is based on an idea we have already alluded to. When automata 
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Figure 2.2: The token passing system formed by composition of processes and channels 

are composed, the output actions of the automata are identified with input actions 
of other automata that share the same name. As an example, suppose we compose 
Process i with the channel between Process i and Process i + 1. Then the output action 
SEND;(p) of Process i gets identified with the input action SEND;(p) of the channel. 
When we run the new composed automaton, whenever Process i performs a SEND;(p) 
action, the channel will simultaneously perform a SEND;(p) action. Thus the second 
feature of the I/O automaton model is that automata communicate by simultaneous 
performance of shared actions. 

Continuing our example, we can compose the automata for Process i,0 < i < n — 1 
and the channels between them to form a new automaton that represents the token 
passing system. This is shown in Figure 2.2. We assume that all arithmetic on process 
indices is mod n. 

Suppose an automaton A wishes to perform an output action n that is also an input 
action of another automaton B. Some models allow automaton B to block its inputs in 
certain states. If B can block input actions, then A must somehow "handshake" with 
B before A can perform action n. This in turn implies that action n is jointly controlled 
by A and B. In the I/O automaton model, things are much simpler because of a third 
feature of the model. Input actions are enabled in every state of an automaton. Thus 
there is no need for handshaking, and an action is controlled solely by the originator 
of the action. 

The assumption that inputs cannot be blocked is extremely natural for the message 
passing systems studied in this thesis. In real message passing systems, a process must 
be prepared to receive packets in any state. Of course, a process may choose to drop 
a packet when it receives it. On top of this basic model, processes may choose to 
implement a flow control scheme to prevent senders from sending to receivers when 
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the receivers are unable to process packets. However, such "flow control" is not part 
of the basic I/O automaton model. 

Output and internal actions of an automaton A are under the control of A] hence 
they are also called locally controlled actions. Typically, we need to specify certain 
"liveness" guarantees on the performance of locally controlled actions. For example, 
in the token passing system we need to specify that a node holds the token for a 
bounded amount of time. We specify such guarantees using a fourth feature of timed 
automata. The locally controlled actions of an automaton are partioned into a number 
of equivalence classes. Each class c in this partition has a time t c which represents an 
upper bound on the performance of an action in class c. 

Intuitively, each class represents the set of actions under the control of one system 
component. The automaton will guarantee "fair turns" to the enabled actions in each 
class. Suppose some action of class c is enabled at time t. One can think of the 
automaton as having a scheduler that checks every class at time periods of at least 
t c . If some action of class c is enabled when the scheduler checks the class, then some 
action in the class is performed. More precisely, we require that either some action 
of class c will be performed by time t + t c , or no action of class c is enabled in some 
state that occurs before time t + t c . Returning to our example, we specify that each 
SEND;(p) action at Process i is in a separate class with associated class time, say t n . 
We do so because each Process i is a distinct system component that controls the 
sending of its own messages. 

When we compose automata, the state set of the resulting automaton is the cross 
product of the state sets of the the component automata. More interestingly, the 
timing partition of the new automaton is the union of the timing partitions of the 
component automata. Thus composition preserves the timing guarantees of the con- 
stituent automata. 

2.1.3 Specifying Correctness in the I/O Automaton Model 

How do we specify the correctness of an automaton? We take the token passing system 
as an example. Our first attempt may be to specify correctness in terms of a set of 
legal states: a token passing system is correct if in any state there is exactly one token. 
But this allows implementations in which the token always remains at Process 1. Thus 
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we also need to specify that every process receives a token within a bounded amount 
of time. To do this, we need to describe executions of the system and to model how 
time passes. 

To model how time passes, we use the concept of a timed state and a timed action. 
Intuitively, a timed state is a pair (s,t) where s is a state of the automaton and t is a 
time; t is read as the time associated with state s. Similarly, a timed action is a pair 
(a, t) where a is an action of the automaton. 

When an automaton "runs", it generates a string representing an execution of the 
system the automaton models. This string is simply an alternating sequence of timed 
states and timed actions, that begins with a timed start state. An execution must 
respect the state transition rules of the automaton and the timing guarantees specified 
by each class of the automaton. Section 2.2 contains a more precise description. 

Using this definition of an execution, we can say that a token passing system S is 
correct if for every execution a of S: 

• There is exactly one token in any state of a. 

• Within a bounded time after any state of a, Process i receives a token. 

Specifying correctness using a set of legal executions is a reasonable solution that 
we will use sometimes. However, the correctness specification refers to the state of the 
implementing system. Ideally, we should treat the implementing system as a "black 
box" and describe its correctness in terms of its externally visible behavior. 

Suppose we wished our token passing system to be used by other applications. Then 
we need to specify additional actions to act as an interface to such client applications. 
To do so we add two additional actions to each Process i. The first is an output action 
DELIVER.TOKEN; and the second is an input action RetURN_T0KEN;. This is shown 
in Figure 2.3. Intuitively, DELIVER_TOKEN; is used by Process i to deliver a token to 
the external client; R,ETURN_TOKEN; is used by the external client to return the token 
to Process i. Suppose we now compose all process and channel automata. Next, we 
reclassify all actions of the composition as internal actions except the DELIVER_TOKEN 
and Return_Token actions. Such a reclassification can be done formally using a 
"hide" operator. 
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Figure 2.3: Process i with additional interfaces to an external client 
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Figure 2.4: A modular token passing system and its external interface to clients 

The resulting automaton is shown in Figure 2.4. It essentially reflects the interface 
between the token passing system and its clients. A natural question is: can we specify 
correctness of the system solely in terms of the external interface to the clients? 

To answer this question, we first define an external behavior (or behavior for short) 
of an automaton. A behavior corresponding to an execution a is the subsequence of 
a consisting only of timed input and output actions, together with a start time. The 
start time of the behavior is equal to the time associated with the first state of a. A 
sequence j3 is said to be a behavior of automaton A if j3 is the behavior corresponding 
to some execution of A. Clearly any behavior of the automaton in Figure 2.4 will 
consist only of DELIVER_TOKEN and R,ETURN_TOKEN actions. 

Using the notion of a behavior, we can define the "external" correctness of the 
token passing system as follows. A token passing system S is said to be correct if for 
any behavior j3 of S the following two properties hold in the sequence f3: 



• There must be a R,ETURN_TOKEN; between 

DELIVER_TOKEN; and a later DELIVER_TOKENj for any i, j. 



any 
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• Suppose that in j3, a R,ETURN_TOKEN; occurs in bounded time after every 
DELIVER_TOKEN; for all i. Then for any j and any suffix 7 of j3, a 
DELIVER_TOKENj will occur in bounded time after the start of 7. 

The first condition is a "safety" property. It guarantees that Process j will not 
receive the token until all processes that have received the token before j have returned 
the token. The second property is a "liveness property". It ensures that Process j 
will get the token periodically. However, we can only guarantee this property if all 
external clients return the token in bounded time after a token is delivered to them. 
Notice a modelling trick that is being used here. While an I/O automaton A must 
allow all possible inputs, we can specify that A exhibit correct behavior only on certain 
"well-formed" inputs. 

2.2 Formal Summary of the I/O Automaton Model 

We summarize our discussion so far. In this thesis, we will use the following model 
which is a special case of the model in [MMT91]. However, our terminology is slightly 
different from that of [MMT91]. 

An automaton A consists of five components: 



• 



a finite set of actions actions(A) that is partitioned into three sets called the set 
of input, output, and internal actions. The union of the set of input actions and 
the set of output actions is called the set of external actions. The union of the 
set of output and internal actions is called the set of locally controlled actions. 

A finite set of states called states(A). 

A nonempty set start(A) C states(A) of start states. 

A transition relation R(A) C states(A) X actions(A) X states(A) with the property 
that for every state s and input action a there is a transition (s, a, s) £ R(A). 

An equivalence relation part(A) partitioning the set of locally controlled actions 
into equivalence classes, such that for each class c in part(A) we have a positive 
real upper bound t c . (Intuitively, t c denotes an upper bound on the time to 
perform some action in class c.) 
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An action a is said to be enabled in state s of automaton A if there exist some 
s £ states(A) such that (s, a, s) £ -R(A). An action a is disabled in state 5 of automaton 
A if it is not enabled in that state. Since one action may occur multiple times in a 
sequence, we often use the word event to denote a particular occurrence of an action 
in a sequence. 

To model the passage of time we use a time sequence. A time sequence t ,ti,t 2 , ■ ■ ■ 
is a non-decreasing sequence of non-negative real numbers; also the numbers grow 
without bound if the sequence is infinite. A timed element is a tuple (x,t) where t is a 
non-negative real and x is an element drawn from an arbitrary domain. A timed state 
for automaton A is a timed element (s,t) where s is a state of A. A timed action for 
automaton A is a timed element (a, t) where a is an action of A. 

Let X = (x ,t ), (xi,ti), . . . be a sequence of timed elements. We will also use 
Xj.time (which is read as the time associated with element Xj) to denote tj. 

We say that element Xj occurs within time t of element X{ if j > i and Xj.time < 
Xi.time + t. We will use X. start (which is read as the start time of X) to denote t . 

Definition 2.2.1 An execution a of automaton A is an alternating sequence of timed 
states and actions of A of the form (s ,t ), (ai,ii), (si,ii), (g^,^), (^2^2)? • • • such that 
the following conditions hold: 

1. s £ start and (s;, aj+i, -Si+i) £ R for alii > 0. 

2. The sequence can either be finite or infinite, but if finite it must end with a timed 
state. 

3. The sequence t ,ti,t 2 , ■ ■ ■ is a time sequence. 

4- If any action in any class c is enabled in any state S{ of a then within time 
Si.time-\-t c either some action in c occurs or some state Sj occurs in which every 
action in c is disabled. 

Notice that the time assigned to any event a; in a (i.e., a{.time) is equal to the 
time assigned to the next state (i.e., S{.time). Notice also that we have ruled out the 
possibility of so-called "Zeno executions" in which the execution is infinite but time 
stays within some bound. 
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Definition 2.2.2 Consider an execution a of A. Let 7 be the subsequence of a con- 
sisting of timed external actions, and let t be the start time of a. The behavior j3 
corresponding to a is the sequence j3 = io,7- 

Notice that the start time of a behavior is the start time of the corresponding 
execution. The behaviors of automaton A are the behaviors corresponding to the 
executions of A. 

Notice that a behavior is not the same "type" of sequence as an execution since a 
behavior consists of a start time followed by a sequence of timed actions. Formally, 
a behavior sequence j3 is a sequence t , (ai,ii), (a 2 ,t 2 ), ■ ■ ■ such that each a; is drawn 
from some set of actions and such that t ,ti,t 2 , ... is a time sequence. Note that any 
behavior of an automaton is a behavior sequence. We will use f3. start to denote the 
first element in /3; j3. start can be read as the start time of behavior sequence f3. As 
before, we will use a j. time to denote tj. 

Consider an execution a = (s ,t ), (ai,ii), (si,ii), (a 2 ,t 2 ), ■ ■ ■■ The untimed ex- 
ecution corresponding to a is the sequence s , ai, Si, a 2 , s 2 , . . .. For brevity, we will 
frequently describe execution a by the corresponding untimed execution s ,ai,s 2 , . . .. 
By our notation, the time associated with any state S{ (or action aj) in a is S{.time 
(or a j. time). 

Similarly consider a behavior j3 = t ,(bi,ti),(b 2 ,t 2 ), . . . The untimed behavior cor- 
responding to j3 is the sequence b\, b 2 , . . .. We will often describe behavior j3 using the 
corresponding untimed behavior b\, b 2 , . . .. Once again by our notation, the start time 
of j3 is f3. start and the time associated with any action bj in j3 is bj.time. 

Notice that the time associated with the first state of an execution, s .time, is 
allowed to be an arbitrary non-negative real number. As we see below, this assumption 
allows a clean statement of an important lemma about stabilizing automata. In the 
timed automaton model [MMT91], each class also has an associated lower bound. In 
our model, the lower bound associated with each class is implicitly assumed to be zero. 
These two assumptions (or lack of assumptions) restrict us to modelling systems in 
which the value of time is not used by the protocol. In the protocols we describe later, 
we will use time only to model liveness guarantees and to measure time complexity. 

The following lemma is useful later. 
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Lemma 2.2.3 Consider any execution a of an automaton A. Any suffix of a that 
starts with a timed start state is also an execution of A. 

Proof: In essence, the lemma follows because we have no lower bounds on the time 
between actions. | 

2.2.1 Composition and Hiding 

To describe a collection of automata we will use a finite index set, say I. For example, 
an index set could be the set of vertices in a graph, or the set of edges in a graph. 
Thus we often speak of a collection {A{,i £ 1} of automata, where / is an index set. 
Often we are interested in composing such a collection of automata. 

Before automata can be composed they must obey certain restrictions. Clearly, 
the automata cannot share internal or output actions without violating the principle 
that each action is controlled by exactly one automaton. A collection of automata 
{A{,i £ /} is said to be compatible if the collection is finite, the output actions of all 
automata are disjoint, and the internal actions of any automaton are disjoint from the 
actions of any other automaton. Notice that this allows several automata to have a 
common input action. This can be used, for instance, to model a broadcast from one 
automaton to several other automata. 

Let / be a finite index set. The composition of a compatible collection {A{,i £ /} 
is denoted by A = IL; e jA;. A is the automaton formed as follows: 

• An action n is an input action of A if n is the input action of some A{ in the 
collection and is not the output action of some other Aj in the collection. The set 
of output actions of A is the union of the output actions of the collection. The 
set of internal actions of A is the union of the internal actions of the collection. 

• The set of states of A is the cross-product of the state sets of the automata in 
the collection. The set of initial states of A is the cross-product of the initial 
state sets of the automata in the collection. 

• Let s\i denote the projection of some state s of A onto automaton A{. Then the 
transitions of A are the triples (5, a, s) such that for any i £ I: 
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— If a is an action of A{ then (s|i,a, s\i) is a transition of A{. 

— If a is not an action of A{ then s\i = s\i. 

• The partition of A is the union of the partitions in the collection. Thus if an 
action a belongs to some class c of any A{, then a belongs to class c in A. The 
upper bound corresponding to class c in A is the upper bound corresponding to 
class c in A{. 

Note: Suppose we simply took the set of input actions of A to be the union of 
the set of input actions of the components. Then any action that is an input action of 
say Ai and an output action of say Aj will be classified both as an input and output 
action of A. It is to avoid this problem that the input actions of A are defined the way 
they are. Notice also that any action of the component automata is also an action of 
the composition. 

We return to our claim that the I/O automaton model is compositional. We would 
like to show that the behavior of a composition can be inferred from the behavior of 
its components. We translate the following two lemmas from [MMT]. 

We use j3\Ai to represent the projection of a behavior j3 of A on to some constituent 
automaton A{. The projection is the subsequence of j3 containing actions of A{. We also 
assume that j3\Ai inherits the times of j3 in the natural way. Thus the time associated 
with any action in j3\Ai is the time associated with the corresponding action in j3, and 
the start time of j3\Ai is the start time of f3. 

The first lemma shows how we can "cut" a behavior of the composition into be- 
haviors of each of the pieces. The second lemma shows how we can "paste" a sequence 
of behaviors of the pieces into a behavior of the composition. 

Lemma 2.2.4 Cut Lemma Let {A{,i £ 1} be a compatible collection of automata 
and let A = II; e jA;. Let j3 be any behavior of A. Then j3\Ai is a behavior of Ai for 
every i G /. 

Lemma 2.2.5 Paste Lemma Let {A{,i £ 1} be a compatible collection of automata 
and let A = II; e jA;. Let j3 be a behavior sequence such that each action in j3 is an 
external action of A. If j3\Ai is a behavior of Ai for every i £ I, then j3 is a behavior 
of A. 
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Finally, there is a hiding operator on automaton A with respect to some subset S 
of the actions of A. The result of this operation is to to create a new automaton that 
is identical to A except that all actions in S are reclassified as internal actions. 

2.2.2 Useful Notation 

In all the automata that we will describe in this thesis, the state of an automaton 
consists of values assigned to a set of variables. We use record notation to extract the 
values of specific variables in the state. We say that a variable x has a value v in state 
s if s. x = v. We sometimes omit the state s if it is clear by context which s we mean. 

When we refer to the state S{ in execution a we mean the i-th state in the sequence 
a. An interval of an execution a is a contiguous subsequence of a that starts and ends 
with a timed state. The duration of an interval [sj,Sj] is Sj.time — Si.time. An interval 
of a behavior j3 is a contiguous subsequence of f3. 

A predicate of automaton A is a subset of the states of A. A predicate S is described 
by a Boolean formula on variables; a state s £ S iff the values of the variables of A 
in state s satisfy the Boolean formula. Thus if S is x = 3 then s £ S iff s.x = 3. We 
also say that S holds in state s or S is true in state s to mean s £ S. We say that a 
predicate S remains true for time t after state S{ in execution a if for all states Sj that 
occur within time t of S{ in a, Sj £ S. 

2.2.3 Modelling Asynchronous Protocols 

In this thesis we will study "asynchronous" protocols. We wish such protocols to work 
regardless of timing assumptions. This is typically done using "fairness assumptions" 
instead of timing assumptions. However, we can model essentially the same thing by 
ensuring that our protocols work regardless of the value of t c assigned to any class c. 

The advantage of using parameterized upper bounds for classes comes in measuring 
time complexity. In the standard approach, after first proving correctness using "fair" 
executions, time complexity is then measured using the the assumption that the class 
times t c are constants like 1 or 0. Often the time complexity arguments are extremely 
similar to the liveness arguments used in the proof of correctness. In our approach 
there is no need for this double effort; we replace liveness arguments by showing time 
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bounds on certain events occurring. These time bounds are parameterized in terms 
of the class times t c . Thus to obtain time complexity measures, we simply substitute 
particular values for t c . 
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Chapter 3 

Stabilization: Definitions and 
Properties 



In this chapter we will describe the basic definitions of stabilization we will use for 
the rest of the thesis. We begin in the first section with a state-based definition of 
stabilization that corresponds to the standard definitions in the literature. In the next 
section we describe a new definition of stabilization in terms of external behaviors. In 
Section 3.4 we describe a two stage proof technique for proving stabilization properties. 
Then in Section 3.5 we describe a modularity result. This result shows that (under 
certain conditions) we can prove facts about the stabilization of a big system by proving 
facts about the stabilization of each of its parts. 



3.1 Definitions of Stabilization based on Executions 

All the existing definitions of stabilization are in terms of the states and executions 
of a system. We will begin with a definition of stabilization that corresponds to the 
standard definitions (for example, that of Katz and Perry [KP90]). In the next section, 
we will describe another definition of stabilization in terms of external behaviors. We 
believe that the definition of behavior stabilization is appropriate for large systems 
that require modular proofs. However, the definition of execution stabilization given 
below is essential in order to prove results about behavior stabilization. We begin with 
the definition of execution stabilization since it is also the definition that most readers 

49 



are familiar with. 

Suppose we define the correctness of an automaton in terms of a set C of legal 
executions. For example, recall that for the token passing system of Chapter 2 we 
defined the legal executions to be those in which there is exactly one token in every 
state, and in which every process periodically receives a token. 

What do we mean when we say that an automaton A stabilizes to the executions 
in set C in time tl Intuitively, we mean that within time t, all executions of A begin to 
"look like" an execution in set C . For example, suppose C is the set of legal executions 
of a token passing system. Then in the initial state of A there may be zero or more 
tokens. However, the definition requires that within time t of the start of any execution 
of A, there is exactly one token in any state. 

To formalize this, we begin with the definition of a i-suffix of an execution a. 
Intuitively, this is a suffix of a whose first element occurs no more than time t after 
the start of a. Although we will apply this definition only to executions, we will state 
the definition in terms of an arbitrary sequence of timed elements. Recall from Chapter 
2 that a timed element is a tuple (x,t) where x is either a state or an action, and t is 
a time. 

Definition 3.1.1 Consider any two sequences of timed elements a and a' . We say 
that a' is a t-suffix of execution a if: 

• a' .start — a. start < t and 

• a' is a suffix of a. 

We can now formally define execution stabilization to a set of executions: 

Definition 3.1.2 Let C be a set of sequences of timed elements. We say that automa- 
ton A stabilizes to the executions in C in time t if for every execution a of A there is 
some t-suffix of execution a that is in C . 

We also make the accompanying definition of execution stabilization to another 
automaton: 
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Definition 3.1.3 We say that automaton A stabilizes to the executions of automaton 
B in time t if for every execution a of A there is some t-suffix of execution a that is 
an execution of B. 

So far, we have not made any assumptions about the automaton A that stabilizes 
to the executions specified by a set C or another automaton B. However, recall 
that in Dijkstra's original definition, an automaton is "self-stabilizing" if regardless of 
what state it starts in, an automaton "eventually" produces "legal" executions. Our 
definitions are more general than Dijkstra's definitions. To see this, we now extend 
Dijkstra's notion of "self-stabilization" to our timed setting. 

As a stepping stone, for any automaton A we define U(A) (which can be read as 
the unrestricted version of A) to be the automaton that is identical to A except that 
any state of A can be a start state of U(A). 

Definition 3.1.4 For any automaton A, we let U(A) denote the automaton that is 
identical to A except that start(U(A)) = states(A). 

Next, we can say that an automaton A "self-stabilizes" in time t if the following 
holds: even if we start A in a state other than one of it's start states, the resulting 
execution will begin to "look like" a properly initialized execution of A within time t. 
Formally: 

Definition 3.1.5 We say that an automaton A self- stabilizes in time t ifU(A) stabi- 
lizes to the executions of A in time t. 

The following simple lemma shows that execution stabilization is transitive. This 
is an important lemma because it allows to prove execution stabilization in several 
stages. 

Lemma 3.1.6 If automaton A stabilizes to the executions of automaton B in time ti 
and B stabilizes to the executions of automaton C in time t 2 , then A stabilizes to the 
executions of C in time ti + t 2 ■ 
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3.2 Definitions of Stabilization based on External 
Behavior 

In Chapter 2, we argued that a major theme of the I/O Automaton model [LT89] is 
the focus on external behaviors of an automaton. For instance, the correctness of an 
automaton is specified in terms of its external behaviors. In Chapter 2 for instance, we 
showed how to specify the correctness of a token passing system without any reference 
to the state of the system. We did this by specifying the ways in which token delivery 
and token return actions can be interleaved 

Thus it natural to look for a definition of stabilization in terms of external behav- 
iors. We would also hope that such a definition would allow us to modularly "compose" 
results about the stabilization of parts of a system to yield stabilization results about 
the whole system. 

Typically, the correctness of an 10 A is specified by a set of legal behaviors P. An 
IOA A is said to solve P if the behaviors of A are contained in P. For stabilization, how- 
ever, it is reasonable to weaken this definition and ask only that an automaton exhibit 
correct behavior after some finite time. In most of this thesis, we will use the behavior 
stabilization definitions for specifying the stabilization properties of a system. 

As in the case of execution stabilization, we begin with the definition of a i-suffix 
of a behavior f3. Intuitively, this is a portion of j3 that starts at time no more than t 
after the start of f3. However, this is not as easy as defining a i-suffix of an execution. 
Recall that a behavior j3 = (^0,7) consists of two components: a start time t and a 
sequence of timed actions 7. Thus we cannot simply define a i-suffix of j3 to be a suffix 
of j3 as we did in the case of an execution. We would also like a f -suffix of j3 to be a 
behavior sequence: thus the i-suffix must have a start time as well as a sequence of 
timed actions. 

Figure 3.1 is a pictorial view of a behavior j3. The second row represents the 
sequence of actions of the behavior and the first row represents the times corresponding 
to each action as well as the start time of the behavior. The dashed line to the right of 
the start time represents an instant of time that occurs no more than time t after the 
start of the behavior. Now consider the behavior that starts at the time corresponding 
to the dashed line and consisting of all actions to the right of the dashed line. We 
will call such a behavior a i-suffix of behavior j3. Intuitively, as we said before, this 
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Figure 3.1: A pictorial view of a i-suffix of a behavior 

represents a portion of f3. However the portion starts at time no more than t after the 
start of f3. Formally: 

Definition 3.2.1 Consider any two behavior sequences j3 = 2rj,7 and f3' = t' ,~f'. We 
say that f3' is a t-suffix of behavior j3 if: 

• (3'. start — (3. start < t. 

• 7' is a suffix of 7 containing all actions in j3 that occur at times strictly greater 
than f3' .start. 

Note that the definition allows 7' to contain some, none, or all of the actions in 
j3 that occur at times equal to f3' .start = t' . Note that the definition is similar but 
yet different from the definition of a i-suffix of an execution. Using this, we can now 
define behavior stabilization analogous to execution stabilization: 

Definition 3.2.2 Let P be a set of behavior sequences. An IOA A stabilizes to the 
behaviors in P in time t if for every behavior j3 of A there is a t-suffix of behavior j3 
that is in P. 

An automaton is said to solve another automaton B if every behavior of A is a 
behavior of B. Similarly, we can specify that A stabilizes to the behaviors of some 
other automaton B. 
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Definition 3.2.3 An automaton A is said to stabilize to the behaviors of another 
automaton B in time t if for every behavior j3 of A there is a t-suffix of behavior j3 
that is a behavior of B. 

The following lemmas are "obvious" in that they are what we expect to be true. 

Lemma 3.2.4 Every automaton stabilizes to its own behaviors in time 0. 

Lemma 3.2.5 // every behavior of automaton A is a behavior of automaton B, then 
A stabilizes to the behaviors of B in time 0. 

The next lemma is a transitivity result for behavior stabilization. It allow us to 
prove behavior stabilization results in stages. 

Lemma 3.2.6 // automaton A stabilizes to the behaviors of automaton B in time ti 
and B stabilizes to the behaviors of automaton C in time t 2 , then A stabilizes to the 
behaviors of C in time ti + t 2 . 

Another obvious consequence of our definition is: 

Lemma 3.2.7 // automaton A stabilizes to the behaviors of an automaton B in time 
t and t > t then A stabilizes to the behaviors of B in time t. 

The previous lemma motivates a natural complexity measure called the stabilization 
time from A to B. Intuitively, this is the smallest time after which we are guaranteed 
that A will behave like B. However, since we are dealing with a potentially infinite 
set of behaviors we have to be a little more careful. 

Definition 3.2.8 The stabilization time from A to B is the infimum of all t such that 
A stabilizes to the behaviors of B in time t. 

The next lemma is simple but important because it ties together the execution and 
behavior stabilization definitions. It states that execution stabilization implies behav- 
ior stabilization. In fact, the only method we know to prove a behavior stabilization 
result is to first prove a corresponding execution stabilization result, and then use this 
lemma. Thus the behavior and execution stabilization definitions complement each 
other in this thesis: the former is typically used for specification and the latter is often 
used for proofs. 
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Lemma 3.2.9 // automaton A stabilizes to the executions of automaton B in time t 
then automaton A stabilizes to the behaviors B in time t. 

Proof: Let j3 be any behavior of A. Let a be some execution of A corresponding to 
f3. From the hypothesis, there is some a' that is a i-suffix of execution a and is also 
an execution of B. Consider the behavior f3' of B corresponding to execution a' of B. 
From the definitions, we can verify that that f3' is a i-sufnx of behavior f3. | 



3.3 Discussion on the Stabilization Definitions 

First, notice that we have defined what it means for an arbitrary 10 A to stabilize to 
some target set or automaton. For most of the thesis we will be interested in proving 
stabilization properties only for a special kind of automata: unrestricted automata. 
An unrestricted IOA (see Section 3.5) is one in which all states of the automaton are 
also start states. Such an IOA models a system that has been placed in an arbitrary 
initial state by an arbitrary initial fault. However, (this important observation is due 
to Arora and Gouda [AG92]), we might also be interested in modelling the response 
of a system to more restricted kinds of initial faults. Such restricted faults initially 
place a system A in some subset L of the states of A. After the initial fault, we 
would like A to behave like some other automaton B. Thus our general definitions of 
stabilization are applicable to other, more restricted forms of initial faults. While we 
will not mention this explicitly from this point on, many of the techniques developed 
in this thesis can also be applied to more restricted initial fault models. 

Next, it is reasonable to ask whether our definitions are sufficient to cover all cases 
that arise in practice? They do cover all the examples in this thesis. There are two 
places we can weaken our definitions. The first is that instead of requiring that all 
behaviors (or executions) of A stabilize in time t, we might only that certain "well- 
formed" behaviors (or executions) of A stabilize. A "well-formed" behavior (execution) 
is a behavior (execution) with certain restrictions on the input actions. For instance, if 
A can pass a token to the external environment E, we might only require stabilization 
for those behaviors (executions) in which E returns the token to A in bounded time. 

The second possible modification (which is sometimes needed, though again not 
in this thesis) is to only require (in Definitions 3.2.2 and 3.1.2) that the i-suffix of 
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a behavior (execution) be a suffix of a behavior (execution) of the target set. One 
problem with this modified definition is that we know of no good proof technique to 
prove that the behaviors (executions) of an automaton are suffixes of a specified set of 
behaviors (executions). 1 By contrast, it is much easier to prove that every behavior of 
an automaton has a suffix that is in a specified set. 

Thus we prefer to use the simpler definitions for this thesis. 



3.4 Proof Technique 

We begin by defining an extremely useful piece of notation that is used extensively in 
this thesis. This notation allows us to specify an IOA that is identical to another IOA 
except for start states. This is roughly the inverse of the U(A) notation that creates 
an unrestricted version of automaton A without start states. 

Definition 3.4.1 For any automaton A and any subset L of states(A), we denote by 
A\L the automaton that is identical to A except that start(A\L) = L. 

There has been a great deal of work in designing ordinary automata that have 
specific start states to solve specific problems. It would be nice to gain leverage from 
this existing body of work. Suppose we are given an IOA A that solves a set of 
behaviors P and we now wish to design an IOA B that stabilizes to the behaviors in 
P. Our goals are: 

• We would like to use as much of the design of A as possible to design B. 

• We would like to use as much of the proof that A solves P as possible to prove 
that B stabilizes to the behaviors in P. 

We now describe one way in which these goals can be achieved. The following 
lemma is immediate from the definitions. 



^uch proofs seem to involve arguments about reachable states. Familiar inductive proof techniques 
(such as invariant arguments, progress metrics etc.) do not seem to suffice for this purpose. 
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Lemma 3.4.2 Consider an automaton A, L C states(A), and a problem P. Suppose 
the following conditions are true: 

• A stabilizes to the behaviors of A\L in time t. 

• Any behavior of A\L is in P. 

Then A stabilizes to the behaviors in P in time t. 

In the next two subsections, we describe the techniques used in this thesis for 
proving the two items on the list. 

3.4.1 Proving that an Automaton Solves a Problem 

To prove that any behavior of some automaton A is a behavior contained in some 
problem P, it suffices to prove that every behavior of A is a behavior of some other 
automaton P, and that every behavior of P is in P. 

There are well-known techniques (e.g., [LT89]) to show any behavior of an au- 
tomaton A is a behavior of another automaton B. A commonly used technique is a 
refinement mapping. The basic idea is to establish a suitable mapping / between a 
state of A and a state of B. Given an execution a of A, we use / to construct a 
mapped execution of B that has the same external behavior as a. 

Theorem 3.4.3 Refinement Mapping: Let A and B be automata with the same 
set of external actions. Let t c be the upper bound on the time to perform actions in 
any class c of B. Let f be a mapping from the states of A to the states of B such that: 

1. For any start state s of A, f(s) is a start state of B. 

2. For all transitions (s,tt,s) of A, either 

• 7r is an action of B and (/(s),7r, /(s)) is a transition of B OR 

• 7r is not an action of B and f(s) = f(s). 

3. For any class c of B and any execution a of A, suppose some action in c is 
enabled in f(s). Then within t c time of s in a either: 
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• Some action in class c occurs OR 

• Some state s occurs such that no action in class c is enabled in f(s). 

Then every behavior of A is a behavior of B 

Proof: Consider any execution a of A. We extend the function / to map executions 
as well as states in the following way. Let /(a) be the sequence formed from a by: 

• Removing every timed action in a that is not a timed action of B and the timed 
state following such an action. 

• Replacing every timed state (s,t) in the remaining sequence by (/(s),i). 

Since we retain every action of B, the behavior corresponding to /(a) is the be- 
havior corresponding to a. 

Next, we verify that /(a) is an execution of B. To do so we check the four 
conditions in Definition 2.2.1. Let a = (s ,t ), (ai,ii), (si,2i) . . .. Clearly, by con- 
struction, /(a) is an alternating sequence of timed states and actions of B. Let 

f(a) = (s' ,t' )(a[XMX)--~ 

We check the four conditions in turn: 

1. First s' = f(s ) £ start(B) by hypothesis. Next consider (s^, a' i+1 , s' i+1 ) for all 
i > 0. Let ak be the i + l'st action of B in a. Intuitively, a*, is the action in a 
that generated a' i+1 in f(ct). Clearly, a*. = a' i+1 . 

Next, consider the smallest j < k such that all actions between Sj and Sk-i in a 
are not actions of B. Then by construction, f(sj) = s[ and f(sk) = s' i+1 . Also 
by the hypothesis, f(sj) = f(sk-i) and (f(sk-\),ak,f(sk)) is a transition of B. 
Thus (s^, a' i+1 , s' i+1 ) is a transition of B. 

2. The second condition follows trivially from the construction. 

3. The third condition follows because any subsequence of a time sequence is a time 
sequence. 
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4. Suppose some action in some class c of B is enabled in some state s[ of f(ct). 
Then, by construction there is a corresponding state Sj in a such that if 7 is the 
suffix of a starting with Sj, then f{~{) is the suffix of /(a) starting with s[. Then, 
by hypothesis, either: 

• Some action in class c of B occurs within t c time of Sj in a. But by 
construction, the same action occurs within t c time of s[ in f(ct). 

• Some state Sk occurs within t c time of Sj in a and such that all actions in 
class c are disabled in f(sk). Let k be the smallest index such that this is 
true for Sk- Now if a*, is not an action of B, then since f(sk-i) = f(sk) this 
contradicts the assumption that k is the smallest index with this property. 
Thus ak is an action of B and hence, by construction, (/(s*.), Sk-time) occurs 
in /(7) . Thus there is a state that occurs within t c time of s[ in /(a) and 
such that no action of c is enabled in this state. 



3.4.2 Proving that an Automaton Stabilizes to another Au- 
tomaton 

We give one such technique in the following definition and theorem. The technique 
is similar to techniques used for proving liveness properties (e.g., [OL82, MP91]) of 
concurrent programs. Our theorem is a small generalization of a theorem for proving 
stabilization properties that was previously proposed in [GM90]. 

Let I be a closed predicate of automaton A - once L is true in an any execution 
of A, L remains true for the rest of the execution. We would like to prove that in 
any execution of A, L becomes (and stays) true in bounded time. This implies that A 
stabilizes to the executions of A\L in bounded time. We will describe a proof rule for 
this purpose. Intuitively, instead of proving directly that the goal L eventually holds 
we prove that a number of subgoals Li (each of which is a predicate of A) become and 
stay true. The L{ are chosen so that L is true if all the individual L{ are true. 

Each subgoal L{ can be intuitively thought of as "depending" on some other set of 
subgoals. This dependence relation is formalized by a partial order <. If j < i, then 
(intuitively) L{ "depends" on Lj. Suppose we could show that if all the subgoals that 
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Li depend on become and stay true, then Li becomes and stays true. Then we can 
concude that eventually all subgoals (and hence L) become and stay true. Intuitively, 
this follows because the dependency relation is acyclic since it is formalized using a 
partial order. 

We can extend this idea to a timed setting to show that L will become true in time 
proportional to the maximum length of a "dependency chain" . A dependency chain is 
formalized using the standard concept of a chain in a partially ordered set. Consider 
a set {Li,i £ 1} where (/, <) is a finite partially ordered set. A chain is simply a 
sequence L\ < L 2 < ■ ■ ■ < Li. 

The preceding discussion motivates the following definition and theorem. 

Definition 3.4.4 We say that an automaton A is stabilized to predicate L using pred- 
icate set £ and time constant t if: 

1. C = {Li,i £ 1} of sets of states of A, where (I,<) is a finite, partially ordered 
index set. We let height(C) denote the maximum length chain in the partial 
order. 

2. n ieI Li C L. 

3. For alii £ / and for all steps (s,7r,.s) of A, if s belongs to Clj<iLj, then s belongs 
to Li. 

4- For every i £ I and every execution a of A and every state s in a the following 
is true. Suppose that either s £ Clj^Lj or there is no Lj < Li. Then there is 
some state s £ Li that occurs within time t of s in a. 

The first condition says there is a partial order on the predicates in £. The second 
says that L becomes "true", when all the predicates in £ become true. The third is 
a stability condition. It says that any transition of A leaves a predicate Li true, if all 
the predicates less than or equal to Li are true in the previous state. Finally the last 
item is a liveness condition. It says that if all all the predicates strictly less than Li 
are true in a state, then within time t after this state, Li will become true. 

We define height(Li), the height of a predicate Li £ £, to be the maximum length 
of a chain that ends with Li in the partial order. The value of height(C) is, of course, 
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the maximum height of any predicate Li £ £. By the liveness condition, within time 
t all predicates with height 1 become true; these predicates stay true for the rest of 
the execution because of the third stability condition. In general, we can prove by 
induction that within time i • t all predicates with height i become and stay true. This 
leads to a simple but useful theorem: 

Theorem 3.4.5 Execution Convergence: Suppose that automaton A is stabilized 
to predicate L using predicate set £ and time constant t. Then, A stabilizes to the 
executions of A\L in time height(C) -t. 

Proof: By induction on h,0 < h < height(C), in the following inductive hypothesis. 

Inductive Hypothesis: There is some state s that occurs within time h • t of the 
start of a such that s £ L{ for all L{ £ £ with height(L{) < h. 

The inductive hypothesis implies that there is some state s that occurs within time 
h • height(C) of the start of a and such that s £ L. The theorem follows from this fact 
taken together with Lemma 2.2.3. 

Base Case, h = 0: Follows trivially since there is no L{ £ £ with height(L{) = 0. 

Inductive Step, < h < height(C) — 1: Assume it is true for h. Then there is 
some state S{ that occurs within time h • t of the start of a and such that for all L{ £ £ 
with height(L{) < h, S{ £ L{. Consider any Lj £ £ with height(Lj) = h + 1. By the 
fourth condition in Definition 3.4.4, we know there must some state Sf^ £ Lj that 
occurs within time t of S{ in a. Let k = Max{f(j) : height(Lj) = h + 1}- Then from 
the third condition in Definition 3.4.4, we see that sj. occurs within time [h + 1) • t of 
the start of a and such that for all Li £ £ with height(L{) < h + 1, S{ £ L{. | 



3.5 Modularity Theorem 

We will mostly deal with stabilization properties of a special class of automata called 
unrestricted automata or UIOA. Intuitively, a UIOA models a system that can start 
in an arbitrary state. 

Definition 3.5.1 A UIOA A is an automaton such that start(A) = states(A) (i.e., 
all states are start states). 
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However, we will often show that a UIOA A stabilizes to the behaviors of a second 
special kind of automaton called a Closed I/O Automaton or a CIO A . We define the 
reachable states of an automaton A to be the states that can occur in executions of A. 

Definition 3.5.2 A CIOA is an automaton such that every reachable state is also a 
start state. 

It is easy to see that: 

Lemma 3.5.3 Every UIOA is a CIOA . 

The two following lemmas are extremely convenient and are used often below with- 
out explicit reference. It is the reason we allow executions and behaviors to start with 
arbitrary values of time. Also, the next two lemmas depend crucially on the fact that 
there are no lower bounds on the time between actions. 

Lemma 3.5.4 Consider any execution a of a CIOA A. Any suffix of a that starts 
with a timed state is also an execution of A. 

Proof: Follows directly from Lemma 2.2.3 and the definition of a CIOA. | 

Suppose we begin to view an automaton after it has "run for a while" and the 
resulting behavior is indistinguishable from an ordinary behavior of the automaton. 
Then, intuitively, we say that the automaton is suffix-closed. More formally: 

Definition 3.5.5 We say that an automaton A is suffix-closed if for every behavior 
f3 of A and every time t > 0, every t-suffix of behavior j3 is a behavior of A. 

A remarkable number of interesting automata we will study in this thesis are suffix- 
closed. This fact is explained by the following lemma: 

Lemma 3.5.6 Any CIOA A is suffix-closed. 
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Figure 3.2: Obtaining the suffix of an execution corresponding to a i-suffix of the behavior of the execution. 

Proof: We will only sketch the main idea of the proof. Consider any behavior j3 of A 
and any t > 0. Let f3' be any i-suffix of behavior f3. Consider any execution a of A 
such that the behavior of a is /3. The proof consists of using a to construct another 
execution a' of A such that the behavior of a' is &'; a' is essentially a suffix of a whose 
start time is adjusted to match the start time of f3' . 

The behavior j3 and corresponding execution a are sketched in Figure 3.2. By 
definition, for every action in j3 there is a corresponding external action in a which 
occurs at the same time. This is sketched by drawing the action in the behavior directly 
above the corresponding action in the execution. (However, since the execution will, in 
general, have internal actions not included in the behavior, the indices of the actions 
will not necessarily match. Thus in the figure a^ in j3 corresponds to a p in a.) 

The i-suffix j3' can be sketched using a line that contains all actions in j3 occuring 
to the right of the line (see Figure 3.2). The line is drawn between two actions in j3 
because the start time of j3' may occur in between the times of two actions in j3. 

We need a suffix a' of a whose behavior is equal to j3' . Thus we look for a state 
s x in a corresponding to the vertical time line drawn in Figure 3.2. But we may not 
have a state in a whose time is equal to the start time of j3' . So (intuitively) we choose 
s x to be the first state that occurs to the "left" of the vertical time line. Then we 
choose a' to be the suffix of a starting with s x and with the time of s x adjusted to 
be equal to f3' .start. This works for two reasons. First, by definition of a CIOA, s x 
is a start state of A. Second, we have no lower bounds on the time between actions 
in A. Thus increasing the time of the initial state of an execution (and such that the 
resulting time is no greater than the time of the first action) still leaves us with a legal 
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execution. | 

The suffix-closed property is not just a interesting curiousity. It also provides the 
basis for the following important Modularity Theorem that we discuss next. 

Our Modularity Theorem about the stabilization of composed automata may seem 
"obvious" . We would expect that if each piece A{ of a composed system stabilizes to the 
behaviors of say B{, then the composition of the A{ should stabilize to the composition 
of the B{. Sadly, this is not quite true. There is a counterexample described in Section 
3.5.1 which shows that if we allow some of the B{ to be arbitrary automata, then this 
statement is false. 

The main problem is that for a given behavior of the system A, the component 
automata A{ may stabilize at different times. But if each of the A{ begin to "look 
like" the corresponding B{ at different times, then it may not be possible to paste the 
resulting behavior into a behavior that "looks like" a behavior of B. However, this 
problem does not arise if each of the B{ is suffix-closed. Thus we have the following 
result. 

Theorem 3.5.7 Modularity: Let I be a finite index set. Let Aj = {Ai,i £ /} be a 
compatible set of automata and Bj = {B{,i £ /} be a second set of compatible, suffix- 
closed automata. Suppose also that for alii £ /, A{ stabilizes to the behaviors of Bi in 
time t. Let A = II; e j Aj and B = II; e j.Bj. Then A stabilizes to the behaviors of B in 
time t. 

Proof: The proof relies on the Cut Lemma (Lemma 2.2.4) that allows us to dis- 
sect a behavior of a system into its component behaviors, and the Paste Lemma 
(Lemma 2.2.5) that allows us to paste component behaviors into a system behavior. 

Consider any behavior f3 of A. Consider the f3' that is a i-suffix of behavior f3 and 
such that: 

• (3'. start = t + (3. start. 

• Any actions in f3' occur at times strictly greater than t. 

We can verify that such a f3' exists from the definition of a i-suffix of a behav- 
ior. Intuitively, f3' is chosen so that all component behaviors are guaranteed to have 
stabilized in f3' . 
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Now consider any i £ J. By the Cut Lemma (Lemma 2.2.4), j3\Ai is a behavior 
of A{. But because A{ stabilizes to the behaviors of B{ in time t, there must be some 
t' < t and some /3; such that: 

• j3i is a i'-suffix of j3\Ai and is also a behavior of B{. 

• fli. start = t' + j3. start. 

Next consider j3'\Ai. It can be verified that j3'\Ai is a ^''-suffix of /3; for some t" . 
Thus by the fact that B{ is suffix- closed, j3'\Ai is a behavior of B{. Thus j3'\Ai is a 
behavior of B{ for all i. Hence by the Paste Lemma (Lemma 2.2.5), j3' is a behavior 
of B. The theorem follows since f3' is a i-suffix of behavior f3. | 

3.5.1 Discussion of the Modularity Theorem 

In the hypothesis of the modularity theorem, we assumed that each of the B{ was suffix- 
closed. We show a counterexample to show that if the B{ are allowed to be arbitrary 
automata, then the theorem is false. Consider the specification of automaton B{ shown 
in Figure 3.3. Let A{ be a UIOA which is identical to B{ except that the start states of 
Ai are unrestricted (i.e., the initial value of counti in A can be any value in the range 
{0,...,2}). 

It is easy to see that Ai stabilizes to the behaviors of Bi in time 3t because within 
that time the value of counti must reach 0. After such a state, any behavior of Ai 
is a behavior of Bi. Now consider an index set / = {1,2}. Consider A which is the 
composition of A\ and A 2 and B which is the composition of B\ and B 2 . We claim 
that A does not stabilize to B in time 3t (or in fact in any finite time). 

To see this, we start with the following observation. In any behavior of B in which 
the actions of B\ and B 2 strictly alternate, the counter values output in such a behavior 
will be of the form 0,0,1,1,2,2,0,0,.... Now consider the behavior corresponding to 
an execution of A in which counti = initially and count 2 = 2 initially and the actions 
of A\ and A 2 strictly alternate starting with A\. Then the counter values output in 
such a execution will be of the form 0, 2, 1, 0, 2, 1, 0, 2 . . .. From the earlier observation, 
it follows that there is no suffix of this behavior of A that is a behavior of B. 
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The state of B{ consists of an integer variable 


counti G {0, 


...,2} 


The initial value of counti = (* i.e., B 


is not 


a UIOA *) 




Increment;(A;) (*output action, output 


3 counter value using parameter k*) 


Precondition: k = counti 








Effect: counti '■= (counti + 1) mod 3 








Any Increment; action is in a separate 


class with upper 


bound t. 



Figure 3.3: Specification for Automaton Bi 



Suppose we weakened the definition of stabilization to allow any i-suffix of A to be a 
suffix of a behavior of B. With this weaker definition, the counterexample disappears. 2 
Thus the suffix-closed requirement may be a consequence of our (stronger) stabilization 
definition. However, we did not find a way to prove the modularity theorem using 
the weaker definition. Without the suffix-closed requirement it is difficult to "paste" 
together behavior suffixes of the B^s to create a behavior of B. A possible research 
direction would be to look for weaker conditions (than the Bi being suffix-closed) 
under which the modularity theorem would work. Another research direction would 
be to extend these results to systems with non-zero lower bounds on the time between 
actions. 



3.6 Summary 

The two main contributions of this chapter are the definitions of stabilization in terms 
of external behaviors (Definitions 3.2.2 and 3.2.3) and the modularity theorem (The- 
orem 3.5.7). 

The definitions of stabilization in terms of external behaviors are different from 



2 1 am grateful to Robert Gallager and Victor Luchanko of MIT for pointing this out. 
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previous definitions that are in terms of the states and executions of the underlying 
automaton. The external behavior definition allow us to define that automaton A 
stabilizes to another automaton B even though A and B have different state sets. 
This is most useful when A is a low level model (e.g., an implementation) and B is a 
high level model (e.g., a specification). 

The modularity theorem gives us a formal basis for a building-block approach: it 
allows us to prove facts about the stabilization of a big system by proving facts about 
the stabilization of each of its parts. The requirement that each of the target automata 
be suffix-closed may seem restrictive. However, in a stabilizing setting this is not the 
case. Most interesting specifications (for problems such as message delivery, routing, 
and scheduling) are either suffix-closed or can be rephrased so they are suffix- closed. 

We have already seen that any UIOA (an automaton for whom every state is a start 
state) is suffix- closed. Similarly, any CIOA (an automaton for whom every reachable 
state is a start state) is suffix- closed. In a stabilizing setting the basic building blocks 
are UIOAs since they model systems that can start in an arbitrary state. The methods 
developed in this thesis, on the other hand, tend to construct automata that are 
CIOAs. 3 Thus we can build stabilizing solutions modularly in several stages. In the 
first stage we compose s set of UIOAs to yield a CIOA. In the next stage, the resulting 
CIOAs are composed with other UIOAs to yield more CIOAs. This process can be 
repeated indefinitely to build a complex stabilizing solution to a problem. Since all the 
pieces used are suffix- closed, the modularity theorem can be used at each stage. As 
an example, the stabilizing spanning tree protocol of Chapter 8 is constructed using 
the stabilizing reset protocol of Chapter 7 which in turn can be constructed using a 
stabilizing Data Link implementation. Thus, despite its restrictions, the modularity 
theorem is extremely useful in this thesis. 

We also have a definition of stabilization in terms of executions that corresponds 
to the standard definition ( Definition 3.1.2). This definition in terms of executions is 
used in this thesis for two reasons. First, it is sometimes useful in its own right when 
the alternative would entail adding many superfluous actions. 4 Second, the definition 



3 Our methods construct automata that stabilize to a specification automaton of the form A\L. If L 
is a closed predicate of A - i,e., no transition of A can falsify L - then A\L is a CIOA. 

4 For example, the correctness of a spanning tree protocol is easily defined in terms of a parent 
variable at each node. For correctness, we could specify that the graph induced by the parent variables 
be a spanning tree of the network. An external behavior specification would require additional output 
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in terms of executions is essential for proving results about stabilization of behaviors 
because our main tool for proving such results is Lemma 3.2.9. 

The definitions we use give us several nice properties that we believe any definition 
of stabilization should provide. The properties we believe to be important are: transi- 
tivity for both behavior and execution stabilization (Lemma 3.2.6 and Lemma 3.1.6), 
the fact that execution stabilization implies behavior stabilization (Lemma 3.2.9), the 
fact that stabilization in time t implies stabilization in time greater than t (Lemma 
3.2.7), and the Modularity theorem (Theorem 3.5.7). 

The index contains pointers to the definitions given in the last two chapters. The 
appendix also contains a list of commonly used notation for easy reference. 



actions to report the value of the parent variables at each node. 
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Chapter 4 

Local Checking and Correction in a 
Shared Memory Model 



In Dijkstra's [Dij74] model, a network protocol is modelled using a graph of finite 
state machines. In a single move, a single node is allowed to read the state of its 
neighbors, compute, and then possibly change its state. In a real distributed system 
such atomic communication is impossible. Typically communication has to proceed 
through channels. Such channels must be be modelled explicitly as state machines that 
can store messages sent from one node to another. Also, in message passing models, the 
channel state machine is essentially fixed (with actions to send and deliver packets) but 
the node state machines can be arbitrarily specified by the protocol designer. However, 
in Dijkstra's model all state machines are node state machines and can be arbitrarily 
specified by the protocol designer. 

While Dijkstra's original model is not very realistic, it is probably the simplest 
model of an asynchronous distributed system. This simple model provided an ideal 
vehicle for introducing [Dij74] the concept of stabilization without undue complexity. 
For this chapter only, we will use Dijkstra's original model to introduce the method of 
local checking and correction. In later chapters, we will use a more realistic message 
passing model. Thus the goals of this chapter are: 

• To describe some simple examples of local checking and correction that are more 
interesting than than the trivial examples given in Chapter 1. 
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• To show that existing work in [Dij74] and [AG90] can be understood very suc- 
cinctly using the framework of local checking and correction. 

The main result of the chapter is a theorem (Theorem 4.3.1) that states that 
any locally checkable protocol on a tree can be efficiently stabilized. To motivate this 
theorem, we begin in Section 4.2 with a reset protocol [AG90] due to Arora and Gouda. 
We examine the behavior of the Arora-Gouda protocol in a good state and conclude 
that the protocol is in a good state when all link subsystems are in a good state. Then 
we show how to add correction actions to the protocol such that if a link subsystem 
is in a bad state, it can be corrected to a good state. We also determine an order 
in which link subsystems can be corrected so as to ensure that the correction process 
converges. 

In Section 4.3 we generalize the procedure followed in Section 4.2 to obtain The- 
orem 4.3.1. Then in Section 4.4 we show how one of Dijkstra's protocols [Dij74] can 
easily be understood using Theorem 4.3.1. 



4.1 Modelling Shared Memory Protocols 

We will use the version of the timed I/O Automaton model [MMT91] described in 
Chapter 2. How can we map Dijkstra's model into this model? Suppose each node in 
Dijkstra's model is a separate automaton. Then in the Input/Output automata model, 
it is not possible to model the simultaneous reading of the state of neighboring nodes. 
The solution we use is to dispense with modularity and model the entire network as a 
single automaton. All actions, such as reading the state of neighbors and computing, 
are internal actions. The asynchrony in the system, which Dijkstra modelled using a 
"demon", is naturally a part of our model. Also, we will describe the correctness of 
Dijkstra's systems in terms of executions of the automaton. 



4.2 A Reset Protocol on a Tree 

Before describing the reset protocol due to Arora and Gouda [AG90], we first describe 
the network reset problem. 



70 



Recall that we have a collection of nodes that communicate by reading the state 
of their neighbors. The interconnection topology is described by an arbitrary graph. 
Assume that we are given some application protocol that is being executed by the 
nodes. We wish to superimpose a reset protocol over this application such that when 
the reset protocol is executed the application protocol is "reset" to some "legal" global 
state. A "legal" global state is allowed to be any global state that is reachable by the 
application protocol after correct initialization. The problem is called distributed reset 
because reset requests may arrive at any node. 

A simple and elegant network reset protocol is due to Finn [Fin79]. In this protocol 
each node i running the application protocol has a session number. When the reset 
protocol is not running, the session numbers at every node are the same. When a node 
receives a reset request, it resets the local state of the application (to some prespecified 
initial state) and increments its session number by 1. When a node sees that a neighbor 
has a higher session number, it changes its session number to the higher number and 
resets the application. Finally, the application protocol is modified so that a node 
cannot make a move until its session number is the same as that of its neighbors. This 
check prevents older instances of the application protocol from "communicating" with 
newer instances of the protocol. This protocol is shown to be correct [Fin79] if all the 
session numbers are initially zero and the session numbers are allowed to grow without 
bound. 

We rule out the use of unbounded session numbers as unrealistic. Also, in a stabi- 
lizing setting, having a "large enough" size for a session number does not work. This 
is because the reset protocol can be initialized with all session numbers at their max- 
imum value. Thus, we are motivated to search for a reset protocol that uses bounded 
session numbers. Suppose we could design a a reset protocol with unbounded numbers 
in which the difference between the session numbers at any two nodes is at most one in 
any state. Suppose also that for any pair of neighboring nodes u and v that compare 
session numbers, the session number of one of the nodes (say u) is always no less than 
the session number of the other node. Then, since the session numbers are only used 
for comparisons, it suffices to replace the session numbers by a single bit that we call 
sbit{. This is the first idea in Arora and Gouda's reset protocol [AG90]. 

To realize this idea, we cannot allow a node to increment its session number as 
soon as it gets a reset request. Otherwise, multiple reset requests at the same node 
will cause the difference in session numbers to grow without bound. Thus nodes must 

71 



coordinate before they increment session numbers. 

In Arora and Gouda's reset protocol [AG90], the coordination is done over a rooted 
tree. Arora and Gouda first show how to build a rooted tree in a stabilizing fashion. 
In what follows we will assume that the tree has already been built. Thus every node 
i has a pointer called parent{%) that points to its parent in the tree and the parent of 
the root is a special value nil. 

Given a tree, an immediate idea is to funnel all reset requests to the root. On receipt 
of a request, the root could send reset grants down the tree. Nodes could increment 
their session number on receiving a grant. Unfortunately, this does not work either 
because a node A in the tree may send a reset request and receive a grant before some 
other node B in the tree receives a grant. After getting its first grant, A may send 
another request and receive a second grant before B gets its first grant. Assuming 
that the session numbers are unbounded, the difference in the session numbers of A 
and B can grow without bound. 

Instead, the reset task is broken into three phases. In the first phase, a node sends 
a reset request up the tree towards the root. In the second phase, the root sends a 
reset wave down the tree. In the third phase, the root waits until the reset wave has 
reached every node in the tree before starting a new reset phase. This ensures that 
after the system stabilizes, 1 the use of three phases will guarantee that a single bit 
sbit{ is sufficient to distinguish instances of the application protocol. 

The three phases are implemented by a mode variable modei at each node i. The 
mode at node i has one of three possible values: init, reset, and normal. All nodes are 
in the normal mode when no reset is in progress. To initiate a reset, a node i sets 
modei to init (this can be done only if both i and its parent are in normal mode). A 
reset request is propagated upwards by the action PROPAGATEJlEQUEST; which sets 
the mode of the parent to init when the mode of the child is init. A reset wave is begun 
by the root by the action START _RESET which sets the mode of the root to reset. The 
reset wave propagates downwards by PROPAGATE_R,ESET; which sets the mode of a 
child to reset if the mode of the parent is reset. When a node changes its mode to 
reset, it flips its session number bit, and resets the application protocol. Finally, the 
completion wave is propagated by the action PROPAGATE_COMPLETION; which sets a 
node's mode to normal when all the node's children have normal mode. 



*in this chapter, we will always use the execution stabilization definitions 
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The automaton code for this implementation is shown in Figure 4.1 and Figure 4.2. 
Notice that besides the actions we have already described, there is a CORRECT; action 
in Figure 4.2. This action was used in an earlier version [AG90] to ensure that the 
reset protocol was stabilizing. 

Informally, the reset protocol is stabilizing if after bounded time, any reset requests 
will cause the application protocol to be properly reset. The correction action in 
Figure 4.2 [AG90] ensures stabilization in a very ingenious way. However, the proof of 
stabilization is somewhat difficult and not as intuitive as one might like. The reader 
is referred to [AG90] for details. Instead, we will use local checking and correction to 
describe another correction procedure that is very intuitive. As a result, the proof of 
stabilization becomes transparent. 

We start by writing down the "good" states of the reset system in terms of link 
predicates L{j. We say that the system is in a good state if for all neighboring nodes 
i and j , the predicate L{j holds, where L{j is the conjunction of the two predicates: 

• If (parent(i) = j) and (modej ^ reset) then (modei ^ reset) and (sbit{ = sbitj) 

• If (parent(i) = j) and (modej = reset) then either: 

— (modei ^ reset) and (sbit{ ^ sbitj) OR 

— (sbit{ = sbitj) 

The predicates can be understood intuitively as describing states that occur when 
the reset system is working correctly. The first predicate says that if the parent's mode 
is not reset, then the child's mode is not reset and the two session bits are the same. 
This is true when the system is working correctly because of two reasons. First, the 
child enters reset mode only when its parent is in that mode, and the parent does not 
leave reset mode until the child has left reset mode. Second, if the parent changes its 
session bit, the parent also goes into reset mode; and the child only changes its session 
bit when the parent's mode is reset. 

The second predicate describes the correct states during the second and third 
phases of the reset until the instant that the completion wave reaches j. It says 
that if the parent's mode is reset, then there are two possibilities. If the child has 
not "noticed" that the parent's state is reset, then the child's bit is not equal to the 

73 



The state of the system consists of two variables for every process in the tree: 
modei G {init, normal, reset} 
sbiti, a bit 

PropagatE-Request^ (*internal action to propagate a reset request upwards *) 
Preconditions: 

modei = normal 

i = parent(j) and modej = init 
Effects: modei '■= init 

Start_R,ESET; (*internal action at root to start a reset wave*) 
Preconditions: modei = init and parent(i) = nil 
Effects: 

modei '■= reset; (*also reset application state at this node*) 

sbiti : = ~ sbiti; (*flip bit*) 

Propagate_PlESET; (*internal action to propagate reset downwards*) 
Preconditions: 

modei 7^ reset 

j = parent(i) and modej = reset and sbitj ^ sbiti 
Effects: 

modei '■= reset; (*also reset application state at this node*) 

sbiti :=~ sbiti; (*fhp bit*) 

Propagate_CompletioN; (*internal action to propagate completion wave upwards *) 
Preconditions: 

modei = reset 

For all children j of i: modej = normal and sbiti = sbitj 
Effects: modei '■= normal 

Every action is in a separate class with upper bound equal to 

*"n. 



Figure 4.1: Normal Actions at node i in Arora and Gouda's Reset Protocol. [AG90] 
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Correct; (*extra internal action for correction at node i*) 


Preconditions: 








j = parent(i) ^ nil 








(modej = modei) and 


(sbiti 7^ sbitj) 






Effects: sbiti := sbitj 








Every action is in a separat 


e class with upper 


bound 


equal to 


^n 









Figure 4.2: Original correction action in Arora and Gouda's Reset Protocol [AG90]. 

parent's bit. (This follows because when the parent changes its mode to reset, the 
parent also changes its bit; and just before such an action the second predicate assures 
us that the two bits are the same.) On the other hand, if the child has noticed that the 
parent's state is reset, then the two bits are the same. (This follows because when the 
child notices that the parent's mode is reset, the child sets its bit equal to the parent's 
bit and does not change its bit until the parent changes its mode.) 

Suppose that in some state s these link predicates hold for all links in the tree. 
Then [AG90] show that the system will execute reset requests correctly in any state 
starting with s. This is not very hard to believe. But it means that all we have to 
do is to add correction actions so that all link predicates will become true in bounded 
time. 

The tree topology once again suggests a simple strategy. We remove the old ac- 
tion CORRECT; in Figure 4.2 and add a new action CORRECT.CHILD; as shown in 
Figure 4.3. Basically, CORRECT_CHILD; checks whether the link predicate on the link 
between i and its parent is true. If not, i changes its state such that L{j becomes true. 
Notice that CORRECT_CHILD; leaves the state of i's parent unchanged. Suppose j is 
the parent of i and k is the parent of j. Then CORRECT_CHILD; will leave Lj^ true if 
Lj t k was true in the previous state. 

Thus we have an important stability property: correcting a link does not affect the 
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Correct_ChilD; (*modified correction action at nodes*) 


Preconditions: 






j = parent(i) ^ nil 






Lij does not hold 






Effects: 






sbiti := sbitj 






modei := modej 






All actions are in a separate 


class with upper 


bound t n . 



Figure 4.3: Modified Correction action for Arora and Gouda's Reset Protocol. All other actions are as in 
Figure 4.1. 



correctness of links above it in the tree. Using this we can show that in bounded time, 
all links will be in a good state and so the system is in a good state. Rather than 
prove that this modified automaton stabilizes, we will prove a more general result in 
the next section: that any locally checkable tree automaton can be locally corrected 
into a good global state. 

We will return to the network reset problem in Chapter 7. Our stabilizing reset 
protocol is more efficient than the reset protocol of [AG90] and is also designed to 
work in a message passing model. 



4.3 Tree Correction for Shared Memory Systems 

In the last section, we described informally the problem of stabilizing a reset protocol 
on a tree. We also suggested a technique of adding correction actions to every node. 
That example motivates us to ask whether there is a general result for trees. To 
describe and prove such a general result, we start with the following definitions. 

We will continue to model a network as a single automaton in which a node can 
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read and write the state of its neighbors in a single move using an internal action. 
Formally: 

A shared memory network automaton M for graph G = (E, V) is an automaton in 
which: 

• The state of A/" is the cross-product of a set of node states, S u (Af), one for each 
node u £ V . For any state s of A/", we use s\u to denote s projected onto S u . 
This is also read as the state of node u in global state s. 

• All actions of A/" are internal actions and are partitioned into sets, A u (Af), one 
for each node u £ V 

• Suppose (s,7r, s) is a transition of A/" and n belongs to A u (Af). Consider any state 
s' of A/" such that s'\u = s\u and s'\v = s\v for all neighbors v of u. Then there 
is some transition (s',tt,s') of A/" such that s'\v = s\v for u and all w's neighbors 
inG. 

• Suppose (s,7r,.s) is a transition of A/" and n belongs to A u (Af). Then s\v = s\v 
for all v ^ u. 

Informally, the third condition requires that the transitions of a node u£F only 
depend on the state of node u and the states of of the neighbors of u in G. The fourth 
condition requires that the effect of a transition assigned to node u £ V can only be 
to change the state of u. 

A shared memory tree automaton is a shared memory network automaton where G 
is a rooted tree. Thus for any node i in a tree automaton, we assume there is a value 
parent{%) that points to the parent of node i in the tree. There is also a unique root 
node r that has parent(r) = nil. For our purposes, it is convenient to model the parent 
values as being part of the code at each node. More generally, the parent pointers 
could be variables that are set by a stabilizing spanning tree protocol as shown in 
[AG90]. 

In this chapter, we will often use the phrase "tree automaton" to mean a "shared 
memory tree automaton" and the phrase "network automaton" 2 to mean a "shared 
memory network automaton". 



2 In all subsequent chapters, the terms tree and network automaton have different meanings. 
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A closed predicate 3 of an automaton A is a predicate L such that for any transition 
(5, 7r, s) of A, if s £ L then s £ L. 

A /m£ subsystem of a tree automaton is an ordered pair (u,v), such that u and 
-y are neighbors in the tree. To distinguish states of the entire automaton from the 
states of its subsystems we will sometimes use the word global state to denote a state 
of the entire automaton. For any global state s of a network automaton, we define 
(s|-u., s |v) to be the state of the (u,v) link subsystem. Thus the state of the (v,u) link 
subsystem in global state s is {s\v, s\u). 

A local predicate L u>v of a tree automaton is a subset of the states of a (u,v) link 
subsystem. 

A link predicate set £ for a tree automaton is a set that contains exactly one pred- 
icate for every link subsystem in the tree and which satisfies the following symmetry 
condition: for each pair of neighbors u and v, if (a, b) £ L u>v , then (6, a) £ L v>u . (i.e., 
while a link predicate set has two link predicates for each pair of neighbors, these two 
predicates are identical except for the order in which the states are written down.) We 
will also assume that every link predicate set is non- trivial in that there is at least one 
global state s such that (s|-u., s |v) £ L u>v for all link subsystems (u,v) in the tree. 

A tree automaton is locally checkable for predicate L if there is some link predicate 
set £ = {L u>v } such that: 

L I) {s : (s\u,s\v) £ L u>v for all link subsystems (u,v) in the tree.} 

In other words, the global state of the automaton satisfies L if every link subsystem 
(u,v) satisfies L u>v . 

Recall the definition of stabilization in terms of executions and the definition of 
an unrestricted automaton from Chapter 3. Recall that we use A\L to denote the 
automaton that is identical to A except that the start states of A\L are the states in 
set L. For any rooted tree T, we let height(T) denote the maximum length of a path 
between the root and a leaf in T. 

We can now state a simple theorem. 



3 This is often called a stable predicate. We avoid this phrase because of potential confusion with 
stabilization. 
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Theorem 4.3.1 Tree Correction in Shared Memory Systems: Consider any 
tree automaton T for tree T that is locally checkable for predicate L. Then there exists 
an unrestricted tree automaton T + for T such that T + stabilizes to the executions of 
T\L in time proportional to height(T). 

Thus after a time proportional to the height of the tree, any execution of the new 
automaton T + will "look like" an execution of T that starts with a state in which L 
holds. To prove this theorem we first describe how to construct T + from T and then 
show that T + satisfies the requirements of the theorem. 

Assume that T is locally checkable for predicate L using link predicate set £ = 
{L UiV }. We start by defining the set of global states that satisfy all local predicates. 
Let L' = {s : (s\u,s\v) (E L UiV for all link subsystems (u,v) in the tree.}. Clearly 
L ~D L' . Also because of the non-triviality of the link predicate set, L' is not the empty 
set. To construct T + from T we do the following: 

• We first normalize all node states in T . Intuitively, we remove all states in the 
state set of a node u that are not part of a global state that satisfies L' . Thus 
S U (T + ) := {s\u : s £ L'}. The state set of T + is just the cross-product of the 
normalized state sets of all nodes. Intuitively, this rules out useless node states 
that never occur in global states that satisfy all local predicates. 

• We retain all the actions of T but we add an extra precondition (i.e., an extra 
guard) to each action a u £ A u of T as shown in Figure 4.4. Intuitively, this extra 
guard ensures that a normal action of T is not taken at node u unless all links 
adjacent to u are in "good states". All actions of T remain in the same classes 
inT+. 

• We add an extra correction action CORRECT u for every node u in the tree that 
is not the root. CORRECT u is also described in Figure 4.4. Intuitively, this extra 
action "corrects" the link between node u and its parent if this link is not in 
a "good" state. Each CORRECT u action is put in a separate class with upper 
bound equal to t n . 

We outline a proof of the theorem by a series of lemmas. The first thing a care- 
ful reader needs to be convinced about is that the code in Figure 4.4 is realizable. 
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The state of T + is identical to T except that the state set 
of each node is normalized to {s\u : s £ L'} 

Modified Action a u ,a u £ A u (*modification of action a u in T*) 
Preconditions: 

Exactly as in a u except for the additional condition: 

For all neighbors v of u: (s\u, s\v) £ L UjV 
Effects: 

Exactly as in a u 

Correct u (*extra correction action for all nodes except the root*) 
Preconditions: (parent(u) = v) and ((s|w, s\v) £" L UjV ) 
Effects: 

Let a be any state in S U (T + ) such that (a,s\v) £ L UjV 

Change the state of node u to a 

Each Correct u action is in a separate class with upper bound t n 



Figure 4.4: Augmenting T to create 7~+ 
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The careful reader will have noticed that we made two assumptions. First, in the 
CORRECT u action, we assumed that for any link subsystem (u,v) of T + and any state 
b of node v there is some a such that (a, b) £ L u>v . Second, we assumed that when 
a modified action a u is taken at node u, the resulting state of node u has not been 
removed as part of the normalization step. 

We will begin with a lemma showing that the first assumption is a safe one. We 
show that the second assumption is safe later. 

Lemma 4.3.2 For any link subsystem (u,v) ofT + and for any state a of node u there 
is some b such that (a, b) £ L u>v . 

Proof: We know that for any state a of v there is some state s £ L' such that s\u = a. 
This follows because all node states have been normalized and because L' is not empty. 
Then we choose b = s\v. | 

The next lemma shows a local extensibility property. It says that if any node u and 
its neighbors have node states such that the links between u and its neighbors are in 
good states, then this set of node states can be extended to form a good global state. 

Lemma 4.3.3 Consider a node u and some global state s of T + such that for all 
neighbors v of u, (s\u,s\v) £ L u>v . Then there is some global state s' £ L' such that 
s'\u = s\u and s'\v = s\v for all neighbors v of u. 

Proof: We create a global state s' by assigning node states to each node in the tree 
such that for every link subsystem (u,v), the state of the subsystem is in L u>v . Start 
by assigning node state s\u to u and s\v to all neighbors v of u. At every stage of the 
iteration we will label a node x that has not been assigned a state and is a neighbor 
of a node y that has been assigned a state. But, by Lemma 4.3.2, we can do this such 
that the state of the subsystem containing x and y is in L Xiy . Eventually we label all 
nodes in the tree and the resulting global state is in L'. Once again, this is because for 
every link subsystem (u,v), the state of the (u,v) subsystem is in L u>v . The labelling 
procedure depends crucially on the fact that the topology is a tree. | 

To prove the theorem, we will use the Execution Convergence Theorem (3.4.5). 
However, to apply that theorem we have to work with predicates of T + (i.e., sets of 
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states of T + ) and not link predicates of T + (i.e., sets of states of link subsystems of 
T + ). This is just a technicality that we deal with as follows. For each link subsystem 
(u,v), we define the predicate L' uv = {s : (s\u,s\v) £ L u>v }. Clearly, L' = ClL' uv 

Next consider some u,v,w such that v = parent{u) and w = parent{y). We assume 
that v ^ nil. (But v may be the root in which case w is nil.) The next lemma states 
an important stability property. It states that if L' uv holds in some global state s of 
T + it will remain true in any successor state of s if either: 

• v is the root OR 

• L' vw is also true in s. 

Lemma 4.3.4 Consider some u,v,w such thatv = parent{u) ^ nil and w = parent{y). 
Suppose there is some global state s ofT + such that s £ L' uv and (w ^ nil) — > s £ L' vw . 
Then for any transition (s,tt,s), s £ L' uv - 

Proof: It suffices to consider all possible actions n that can be taken at either u or v 
in state s. It is easy to see that we don't have to consider correction actions because, 
by assumption, neither the CORRECT u or the CORRECT,, action is enabled in state s. 

Consider a modified action a u of T + that is taken at node u. Suppose action a u 
occurs in state s and results in a state s. By the preconditions of action a u , for all 
children x oiu, (s\x,s\u) £ L XiU . But in that case by Lemma 4.3.3 there is some other 
global state s' £ L' such that: s'\u = s\u, s'\v = s'\v and s'\x = s\x for all children x of 
u. Thus by the third property of a network automaton, the action a u is also enabled in 
s' and, if taken in s' ', will result in some state say s' . But since L' is closed, s' £ L' and 
hence (s'\u, s'\v) £ L u>v . But by the third property of a network automaton, s\u = s'\u 
and s\v = s'\v. Thus s £ L' uv . 

The case of a modified action at v is similar. | 

The previous lemma also shows that our second assumption is safe. If a modified 
action is taken at a node u, resulting in state s then s £ L' uv for some v. Thus by 
Lemma 4.3.3 there is some other state s' £ L' such that s'\u = s\u. Thus s\u cannot 
have been removed as part of the normalization step. 

The next lemma states an obvious liveness property. If L' uv does not hold in some 
global state of T + , then after at most t n time units we will eventually reach some 
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global state s in which L' uv holds. Clearly this is guaranteed by the correction actions 
(either CORRECT u or CORRECT,, depending on whether u is the child of v or vice 
versa) and by the timing guarantees. 

Lemma 4.3.5 For any any (u,v) link subystem and any any execution a of T + and 
any state S{ of a, if Si (£ L' uv then there is some later state Sj £ L' uv that occurs within 
t n time units of S{. 

Proof: Suppose not. Then either CORRECT u (if u is the child of v) or CORRECT,, (if 
v is the child of u) is continuously enabled for t n time units after S{. But then by the 
timing guarantees, either CORRECT u or CORRECT,, must occur within t n time after s;, 
resulting in a state in which L' uv (and, of course, L' vu ) holds. | 

We now return to the proof of the theorem. First we define a natural partial order 
on the predicates L' uv . For any link subsystem (u,v), define the child node of the 
subsystem to be u if parent{u) = v and v otherwise. Define the ordering < such that 
L' uv < L' w x iff the child node of the (u,v) subsystem is an ancestor (in the tree T) of 
the child node of of the (w,x) subsystem. 

Using this partial order and Lemmas 4.3.4 and 4.3.5, we can apply the Execution 
Convergence Theorem (Theorem 3.4.5), to show that T + stabilizes to the executions 
of T + \L' in time height(T) • t n . But any execution a of T + \L' is also an execution 
of T\L. This follows from three observations. First, since L' is closed for T , L' is 
closed for T + . Second, if L' holds in all states of an execution a of T + \L', then no 
correction actions can occur in a. Third, any execution of T\L' is also an execution 
of T\L because L ~D L' . Thus we conclude that T + stabilizes to the executions of T\L 
in time height(T) • t n . 

This theorem can be used as the basis of a design technique. We start by designing 
a tree automaton T that is locally checkable for some L. Next we use the construction 
in the theorem to convert T into T + . T + stabilizes to the executions of T\L even when 
started from an arbitrary state. 

4.3.1 Weakening the Fairness Requirement 

In the previous subsection, we assigned each CORRECT u action to a separate class. Ac- 
tually the theorem only requires a property we call eventual correction: if a CORRECT u 
action is continuously enabled, then a CORRECT u action occurs within bounded time. 
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It is interesting that the protocols in [Dij74, AG90] only require that some enabled 
action in the entire network occur in bounded time. In other words, all the actions in 
the entire automaton can be placed in a single class. How can we ensure the eventual 
correction property in a model in which the only guarantee is that some enabled 
action (in the entire network) will occur in bounded time? To make sure the eventual 
correction property holds in such a model, we need to show that it is impossible to 
remain for an unbounded amount of time in a state in which L u>v does not hold and 
in which some other action other than CORRECT u is enabled. This property can be 
established 4 quite easily for the protocols in [Dij74] and [AG90]. 

4.4 Rediscovering Dijkstra's Protocols 

In this section, we will begin by reconsidering the second example in [Dij74]. This 
protocol is essentially a token passing protocol on a line of nodes with process indices 
ranging from to n — 1. Imagine that the line is drawn vertically so that process is 
at the bottom of the line (and hence is called "bottom") and Process n — 1 is at the 
top of the line (and called "top"). This is shown in Figure 4.5. The down neighbor 
of Process i is Process i — 1 and the up neighbor is Process i + 1. Process n — 1 and 
Process are not connected. 

Dijkstra observed that it is impossible (without randomization) to solve mutual 
exclusion in a stabilizing fashion if all processes have identical code. To break symme- 
try, he made the code for the "top" and "bottom" processes different from the code 
for the others. 

Dijkstra's second example is modelled by the automaton D2 shown in Figure 4.6. 
Each process i has a boolean variable up { , and a bit X{. Roughly, up { is a pointer 
at node i that points in the direction of the token, and X{ is a bit that is used to 
implement token passing. Figure 4.5 shows a state of this protocol when it is working 
correctly. First, there can be at most two consecutive nodes whose up pointers differ 
in value and the token is at one of these two nodes. If the two bits at the two nodes 
are different (as in the figure) then the token is at the upper node; else the token is at 
the lower node. 



'I am grateful to Anish Arora for pointing this out to me. 
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Top (Process n-1) 

up = false 

up = false 

up = false Token 

1 up = true 
1 up = true 



1 up = true 
Bottom (Process 0) 

Figure 4.5: Dijktra's protocol for token passing on a line 

For the present, assume that all processes start with X{ = 0. Also, initially assume 
that upi = false for all processes other than process 0. We will remove the need for 
such initialization below. We start by understanding the correct executions of this 
protocol when it has been corectly initialized. 

A process i is said to have the token when any action at Process i is enabled. As 
usual the system is correct when there is at most one token in the system. Now, it is 
easy to see that in the initial state only MoVE_UPo is enabled. Once node makes 
a move, then MoveJJPi is enabled followed by Move_Up 2 and so on as the "token" 
travels up the line. Finally the token reaches node n — 1, and we reach a state s in 
which Xi = X{ + i for i = 0...n — 3 and a: n _i ^ x n _ 2 - Also in state s, up { = true 
for i = 0...ra — 2 and up n _ x = false. Thus in state s, MoVE_DoWN n _i is enabled 
and the token begins to move down the line by executing MoVE_DoWN n _ 2 followed 
by MoVE_DoWN n _ 3 and so on until we reach the initial state again. Then the cycle 
continues. Thus in correct executions, the "token" is passed up and down the line. 

In the good states for Dijktra's second example, the line can be partitioned into two 
bands as shown in Figure 4.7. All bit and pointer values within a band are equal. (If a 
node within a band has upi = false we sketch it as a pointer that points downwards). 
All nodes within the upper band point downwards and all nodes within the lower band 
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The state of the system consists of a boolean variable 

up i and a bit Xi, one for every process in the line. 

We will assume that up = true and up n _ 1 = false by definition 

In the initial state X{ = for i = . . . n — 1 and up i = false for i = 1 ... n — 1 

MoveJJPo (*action for the bottom process only to move token up*) 
Precondition: xo = x\ and up 1 = false 
Effect: xo : = ~ xo 

MovE_DowN n _i (*action for top process only to move token down*) 
Precondition: a; n _2 7^ &n-i 
Effects: 

2-n— 1 • — 2-n— 2 i 

MoveJJp;, 1 < i < n — 2 (*action for other processes to move token up*) 
Precondition: X{ ^ X{_\ 
Effects: 

X{ . — X{ — \ , 

up i := true; (*point upwards in direction token was passed*) 

Move_DowN;, 1 < i < n — 2 (*action for other processes to move token down*) 
Precondition: X{ = Xi + i and upi = true and up i+1 = false 
Effect: up i := false; (*point downwards in direction token was passed*) 

All actions are in a single class with upper bound t n . 



Figure 4.6: Automaton D2: a version of Dijkstra's second example with initial states. The protocol does 
token passing on a line using nodes with at most 4 states. 
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Figure 4.7: In the good states for Dijktra's second example, the line can be partitioned into 2 bands with 
the token at the boundary. 

point upward. The token is at the boundary between the two bands. If the bit value 
X of the upper band is equal to the bit value Y of the lower band, the token is moving 
downwards; if the two bit values are unequal the token is moving downwards. 

We describe these "good states" of D2 (that occur in correct executions) in terms 
of local predicates. In the shared memory model, a local predicate is any predicate 
that only refers to the state variables of a pair of neighbors. Intuitively, we see that if 
two neighboring nodes have the same pointer value, then their bits are equal; also if a 
node is pointing upwards, then so is its lower neighbor. Thus in a good state of D2 , 
two properties are true for any Process i other than 0: 

• If up i _ 1 = up { then X{_\ = X{. 

• If up { = true then up i _ 1 = true. 

First, we prove that if these two local predicates hold for all i = 1 . . . n — 1, then 
there is exactly one action enabled. Intuitively, since up n _ x = false and up = true, we 
can start with process n — 1 and go down the line until we find a pair of nodes i and i — 1 
such that up { = false and up { _ t = true. Consider the first such pair. Then the second 
predicate guarantees us that there is exactly one such pair. The first predicate then 
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guarantees that all nodes j < i — 1 have Xj = X{_\ and all nodes k > i have Xk = X{. 
Thus only one action is enabled. If X{ = X{_\ and i — 1 ^ then only MoVE_DoWN;_i 
is enabled. If X{ = X{_\ and i — 1 = then only MoVE_UPo is enabled. If X{ ^ X{_\ 
and i ^ n — 1 then only MoVE_UP; is enabled. If X{ ^ £;_i and i = n — 1 then only 
M0VE_D0WN n _i is enabled. 

A similar argument shows that if there is exactly one action enabled, then both 
local predicates hold for all i = 1 ... n — 1. 

Let L be the predicate of D2 which consists of the states of D2 in which exactly 
one action is enabled. It is easy to see that D2 is a tree automaton that is locally 
checkable for L. Then we can use Theorem 4.3.1 to convert D2 into a new automaton 
D2 + which stabilizes to executions in which there is exactly one token in each state. 

The correction actions we add are once again different from the original actions in 
[Dij74]. However, the corrections actions we add (and consequently the proofs) are 
much more transparent than the original version. 

Dijkstra also described two more stabilizing mutual exclusion protocols, one using 
if-state machines, where K is greater than the number of processes, and a solution 
using 3-state machines. In both solutions the topology is assumed to be a ring (i.e., 
the bottom and top processes are connected). The first protocol is easily seen to be 
locally checkable. Unfortunately it is no longer a tree automaton and hence Theorem 
4.3.1 does not apply directly. However, with some extra work it is possible to derive 
the final protocol as a basic protocol augmented with correction actions that ensure 
that each link predicate becomes true. An even simpler way is to derive the if-state 
protocol using an idea that we call counter flushing (see Chapter 10 and Appendix E). 

Dijkstra's three state protocol uses a different idea altogether. The protocol is not 
even locally checkable. This protocol seems extremely specific to the ring topology 
used. There are two main ideas. First, tokens are passed up and down a line in 
normal operation just as in the second example. Thus if there is more than one token 
and tokens must keep moving, the tokens must eventually "collide" at some node. 
This node can then destroy one of the tokens. This idea seems limited to a line/ring 
topology. The first idea, however, is not sufficient to detect a situation in which there 
are no tokens. The second idea is to exploit the neighbor relation between the top 
and bottom processes to detect the presence of no tokens. The states of the system 
are such that if there are no tokens, then all processes will have the same state. In 
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normal operation, the states of the top and bottom nodes are always different. Thus 
the absence of tokens can be "suspected" locally if the state of the bottom node is 
not equal to that of the top node. In that case, a token is manufactured. The actual 
protocol can be understood using these ideas. 



4.5 Summary 

Much of the initial work in self-stabilization was done in the context of Dijkstra's 
shared memory model of networks. Later, the work on local checking and correction 
was introduced [APV91b] in a message passing model. The main contribution of this 
chapter is to show that existing work in the shared memory model can be under- 
stood crisply in terms of local checking and correction. Protocols that appeared to be 
somewhat ad hoc are shown to have a common underlying principle. 

However, as we have argued at the beginning of this chapter, we believe that 
message passing models are more useful and realistic. For the rest of the thesis we will 
concentrate on message passing models. The definitions of network automata, local 
predicates, local checkability, local correctability , and link subsystems that we used in 
this chapter are specific to shared memory systems. In the next chapter (Chapter 5) we 
will introduce definitions of these concepts for networks in which nodes communicate 
by message passing. The definitions in Chapter 5 will be used for the remainder of the 
thesis. 

The main theorem in this chapter states that any locally checkable protocol that 
uses a tree topology can be efficiently stabilized. As the reader might expect, there is a 
corresponding Tree Correction theorem for message passing systems that is described 
at the end of Chapter 6. 
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Chapter 5 

Local Checking and Correction for 
Network Protocols 



In Chapter 4 we introduced a method of local checking and correction using a shared 
memory model taken from [Dij74]. Recall that in such a model, nodes communicate 
with their neighbors by reading the state of all neighbors in one atomic action. Thus 
there is no need to model channels between nodes. The shared memory model allowed 
us to introduce local checking and correction in a fairly simple way. However, it is not 
very realistic because of the high degree of atomicity that it assumes. In this chapter, 
and for the rest of the thesis, we will model communication between nodes by explicit 
message passing through links. However, we will restrict ourselves to a special type of 
link called a Unit Storage Data Link or UDL. 

We begin in Section 5.1 by describing our model of a network protocol. We do so by 
modelling the network topology, the links between nodes by Unit Storage links, and the 
nodes themselves. Our model of a Unit Storage Link is new, and so in Section 5.2 we 
argue that such links can be implemented in real-life networks. Section 5.3 introduces 
the important concept of locality in network protocols: some key concepts such as 
local subsystems, local checkability, and local correctability are defined in this section. 
While many of the ideas are similar to the ideas in Chapter 4 (that were developed 
for shared memory systems), the new definitions are slightly more complex because of 
the presence of channels between nodes. The definitions in this chapter are used for 
the remainder of the thesis. 

In Section 5.4 we state the main result of this chapter, the Local Correction Theo- 
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rem. In essence, this result states that any locally correctable protocol can be stabilized 
using a simple transformation. The transformation involves the addition of extra ac- 
tions to do local checking and correction. Section 5.4 also contains a formal description 
of the transformation. The next two sections contains a proof of the Local Correction 
Theorem. We first provide an intuitive "proof" that presents the main ideas and then 
present a formal proof. The formal proof is important because it shows how we use 
the proof techniques of Chapter 4 to formally prove stabilization results. The proofs 
of the Tree and Global Correction theorems in later chapters are much more intuitive; 
the formal proof in this chapter provides an important example of how such intuitive 
proofs can be formalized. 

Finally, Section 5.7 argues that the method of local checking and correction (that 
is formalized by the Local Correction Theorem) is practical, and can be added to real 
networks without an appreciable loss in efficiency. 



5.1 Modelling Network Protocols 

For the rest of this thesis, we will restrict ourselves to proving stabilization properties 
for network protocols and network automata. To model a network protocol, we need 
to model the network topology, the links between nodes, and the nodes themselves. 
Our model is essentially the standard asynchronous message passing model except for 
three differences: 

• The major difference is that links are restricted to store at most one packet at a 
time. 

• The nodes are restricted to use a certain stylized discipline for sending packets 
on unit storage links. 

• We assume that for every pair of neighbors, there is some a priori way of assigning 
one of the two nodes as the "leader" of the pair. 

We will argue that even with these differences our model can be implemented in 
real networks. 
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5.1.1 Modelling Network Topology 

We use a directed graph G to specify the network topology. For any two neighboring 
nodes u and v, there are two edges (u,v) and (v,u). The network automaton will have 
a undirectional channel corresponding to each directed edge. 

In addition, we require that G satisfies the following property: there is a function 
that for any pair of neighboring nodes u and v in G assigns one of the two nodes as 
the "leader" of the edges (u,v) and (v,u). Thus we are really requiring a way to break 
symmetry between neighboring nodes in G. 

It is possible to remove this assumption (of having a leader function) at the cost of 
some increased complexity in the protocols. However this assumption is not restrictive 
in practice. In a real implementation, if every node has a unique ID then a simple 
stabilizing protocol can elect the minimum ID node as leader. Each node can period- 
ically transmit its ID to its neighbor and both nodes choose the minimum ID. If the 
nodes do not have IDs then an equally simple randomized protocol can elect a leader 
on every link. We prefer to encode the leader function directly in the graph G instead 
of presenting these simple protocols explicitly. 

In summary, the nodes and edges of G correspond to the actual physical topology 
of the network, while the leader function describes a way to break symmetry between 
neighboring nodes. Formally: 

We will call a directed graph (V, E) symmetric if for every edge (u,v) £ E there is 
an edge (v,u) £ E. 

Definition 5.1.1 A topology graph G = (V,E,l) is a symmetric, directed graph 
(V,E) together with a leader function I such that for every edge (u,v) £ E, l(u,v) = 
l(v,u) and either l(u,v) = u or l(u,v) = v. 

We use E(G) and V(G) to denote the set of edges and nodes in G. If it is clear 
what graph we mean we sometimes simply say E and V. As usual, if (u,v) £ E we 
will call v a neighbor of u. 

The following definition of a leader edge is useful later. Of the two possible edges 
between two neighbors, it produces the edge directed away from the leader. 

Definition 5.1.2 We call (u,v) a leader edge of G if (u,v) £ E and l(u,v) = u. 
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5.1.2 Modelling Network Links 

Traditional models of a data link have used what we call Unbounded Storage Data 
Links that can store an unbounded number of packets. Now, real physical links do 
have bounds on the number of stored packets. However, the unbounded storage model 
is a useful abstraction in a non-stabilizing context. 

Unfortunately, this is no longer true in a stabilizing setting. // the link can store 
an unbounded number of packets, it can have an unbounded number of "bad" packets 
in the initial state. It has been shown [DIM91a] that almost any non-trivial task is 
impossible in such a setting. Thus in a stabilizing setting it is necessary to define Data 
Links that have bounded storage. 

A network automaton for topology graph G consists of a node automaton for every 
vertex in G and one channel automaton for every edge in G. We will restrict ourselves 
to a special type of channel automaton, a unit storage data link or UDL for short. 
Intuitively, a UDL can only store at most one packet at any instant. Node automata 
communicate by sending packets to the UDLs that connect them. In the next section, 
we will argue that a UDL can be implemented over real physical channels. 

We fix a packet alphabet P. We assume that P = Pdata U P CO ntroi consists of two 
disjoint packet alphabets. These correspond to what we call data packets and control 
packets. The specification for a UDL will allow both data and control packets to be 
sent on a UDL. 

Definition 5.1.3 We say that C UiV is the UDL corresponding to ordered pair (u,v) 
and with link delay t if C UiV is the UIOA defined in Figure 5.1. 

By the convention we have established, C UiV is a UIOA since we have not defined 
any start states for C UiV . The external interface to C UiV includes an action to send a 
packet at node u (SEND U] „(p)), an action to receive a packet at node v (R,ECEIVE U] „(p)), 
and an action FREE U] „ to tell the sender that the link is ready to accept a new packet. 
The state of C UiV is simply a single variable Q u ,v that stores a packet or has the default 
value of nil. 

Notice two points about the specification of a UDL. The first is that if the UDL 
has a packet stored, then any new packet sent will be dropped. Second, the FREE 
action is enabled continuously whenever the UDL does not contain a packet. 
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Each p belongs to the packet alphabet P defined above. 

The state of the automaton consists of a single variable Q u ,v 6 -P U nil. 

SEND U] „(p) (*input action*) 
Effect: 

If Qu,v = nil then Q UjV := p; 

Fr.ee U] „ (*output action*) 
Precondition: Q u ,v = nil 
Effect: None 

Receive U] „(p) (*output action*) 
Precondition: p = Q u ,v 7^ nil 
Effect: Q u ,v '■= nil; 

The Free and Receive actions are in separate classes with an upper 
bound called the link delay which is equal to t for both classes. 



Figure 5.1: Unit Storage Data Link automaton 
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5.1.3 Modelling Network Nodes 

Next we specify node automata. We do so using a set that contains a node automaton 
for every node in the topology graph. For every edge incident to a node u, a node 
automaton N u must have interfaces to send and receive packets on the channels cor- 
responding to that edge. However, we will go further and require that nodes obey a 
certain stylized convention in order to receive feedback from and send packets on links. 

In the specification for a UDL if a packet p is sent when the UDL already has a 
packet stored, then the new packet p is dropped. We will prevent packets from being 
dropped by requiring that the sending node keep a corresponding free variable for the 
link that records whether or not the link is free to accept new packets. The sender sets 
the free variable to true whenever it receives a FREE action from the link. We require 
that the sender only send packets on the link when the free variable is true. Finally, 
whenever the sender sends a packet on the link, the sender sets its free variable to 
false. 

We wish the interface to a UDL to stabilize to "good behavior" even when the 
sender and link begin in arbitrary states. Suppose the sender and the link begin in 
arbitrary states. Then we can have two possible problems. First, if free = true but 
the UDL contains a packet, then the first packet sent by the sender can be dropped. 
However, it is easy to see that all subsequent packets will be accepted and delivered 
by the link. This is because after the first packet is sent, the sender will never set free 
to true unless it receives a FREE notification from the link. But a FREE notification 
is delivered to the sender only when the link is empty. The second possible problem 
is deadlock. Suppose that initially free = false but the channel does not contain a 
packet. To avoid deadlock, the UDL specification ensures that the FREE action is 
enabled continuously whenever the link does not contain a packet. 

Thus we will require that each sending node u keep a corresponding free u [v] variable 
for each neighbor v. By our convention, u can only send packets to v when free u [v] = 
true. Thus we will also require that u enqueue packets that it wants to send to v in an 
outbound queue for the link called queue u [v]. When free u [v] becomes true, the packet 
at the head of the outbound queue is sent on the link. The use of the queuing and 
free disciplines is quite natural for a UDL; more importantly, these conventions make 
it easy to transform a node automaton to do local checking. This will become clearer 
in a few sections. 
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The preceding paragraphs motivate the following formal definition. 

Definition 5.1.4 We say that an automaton N is a node automaton for node u in 
graph G = (V,E,l) and with node delay t if: 

• N has an output action SEND U] „(p) and input actions RECEIVE„ ]U (p) and FREE U] „ 
for each p £ Pdata and for each neighbor v of u. 

• N has a boolean variable free u [v] and a queue variable queue u [v] for every v 
such that (u,v) £ E. The queue variable queue u [v] is a queue of bounded size 
consisting of packets drawn from the data packet alphabet Pdata- 1 

• Actions of N other than SEND U] „(p) can only change queue u [v] by adding packets 
(drawn from alphabet Pdata) to the tail of queue u [v]. Of course, such actions can 
leave the queue unchanged. 

• The code for the output action SEND U] „(p) and the input action FREE U] „ at node 
N is as shown in Figure 5.2. 

• Each SEND U] „(p) action in N is in its own class with upper bound called the node 
delay that is equal to t for all classes. 

In particular, note that every SEND U] „(p) action at a node is a locally controlled 
action. Also, in all the network automata described in this thesis, the transitions 
in automaton N u will only depend on the state of u, the leader function /, and the 
identities of the neighbors v of u. In other words, N u will use only local information 
available to u about the graph G. We prefer not to formalize this requirement as 
part of the definition of a network automaton. Instead, we will use this as an informal 
criterion to rule out trivial solutions (to some problems) in which N u encodes the entire 
graph. 



^^The reader may be puzzled by the fact that the links accept both data and control packets but the 
node automata only send data packets. In a few sections, we will create augmented node automata by 
adding actions to ordinary node automata. These extra actions are used to send and receive control 
packets for the purposes of local checking/correction. 
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Send U] „(p) 




(*output action to send packet 


at head of outbound 


queue*) 


Precondition: free u [v] = 


true and p is head of queue u [v] 






Effect: free u [v] : = 


false; 


Remove p from head of queue u [v] 






r R.EE U] „ 






(*input 


action*) 


Effect: free u [v] : = 


true; 




(*record that link 


is free*) 



Figure 5.2: Code at a node automaton to send a data packet and to respond to a free signal from a link 



5.1.4 Network Automata 

Now we are ready to define a network automaton. Naturally, it is the composition of 
a set of node and channel automata. 

Definition 5.1.5 Let G = (V,E,l) be a topology graph. Let N be a set containing 
exactly one node automaton N u , for every u £ V and such that each N u has node 
delay t. Let C UiV be a UDL for each (u,v) £ E such that every UDL has link delay 
t. Then Net(G,N,t,t), the network automaton with node delay t and link delay t, 
is the composition of the automata N u for all u £ V with the automata C UiV for all 
(u,v) £ E. 

For the most part we will deal with network automata in which the node and link 
delays are fixed. Thus we let t n denote the default node delay and ti denote the 
default link delay. If we do not explicitly mention the node and link delays, then 
the node and link delays are assumed to be t n and ti respectively. Thus we will use 
Net(G, N) to denote Net(G, N, t n ,ti). We will sometimes say that a time t is a constant 
if t = C\t n + c 2 ti, where C\ and c 2 are some real scalar constants. We use "constant 
time" to emphasize that the time does not depend on the size of the network but only 
on the node and link delays. 
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SEND (p) 



FREE 




Physical Channel 




Data Link Sender 



Data Link Receiver 



RECEIVE (p) 



Figure 5.3: Implementing a UDL over a physical channel 

5.2 Implementing Our Model in Real Networks 

In a real network implementation, the physical channel connecting any two neighboring 
nodes would typically not be a UDL. For example, a telephone line connecting two 
nodes can often store more than one packet. The physical channel may also not deliver 
a free signal. Instead, an implementation can construct a Data Link protocol on top 
of the physical channel such that the resulting Data Link protocol stabilizes to the 
behaviors of a UDL (e.g. [AB89], [Spi88a]). 

Figure 5.3 shows the structure of such a Data Link protocol over a physical link. 
The sender end of the Data Link protocol has a queue that can contain a single packet. 
When the queue is empty, the FREE signal is enabled. When a SEND(p) arrives and 
the queue is empty, p is placed on the queue; if the queue is full, p is dropped. If 
there is a packet on the queue, the sender end constantly attempts to send the packet. 
When the receiving end of the Data Link receives a packet, the receiver sends an ack 
to the sender. When the sender receives an ack for the packet currently in the queue, 
the sender removes the packet from the queue. 

If the physical channel is initially empty and the physical channel is FIFO (i.e., 
does not permute the order of packets), then a standard stop and wait or alternating 
bit protocol [BSW69] will implement a UDL. However, if the physical channel can 
initially store packets, then the alternating bit protocol is not stabilizing [Spi88a]. 
There are two approaches to creating a stabilizing stop and wait protocol. Suppose 
the physical channel can store at most X packets in both directions. Then [AB89] 
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suggest numbering packets using a counter that has at least X + 1 values. Suppose 
instead that no packet can remain on the physical channel for more than a bounded 
amount of time. [Spi88a] exploits such a time bound to build a stabilizing Data Link 
protocol. The main idea is to use either numbered packets or timers to "flush" the 
physical channel of stale packets. 

A stop and wait protocol is not very efficient over physical channels that have a high 
transmission speed and/or high latency. It is easy to generalize a UDL to a Bounded 
Storage Data Link or BDL that can store more than one packet. For instance, the 
FREE signal for a BDL should be modified to include the number of packets currently 
stored in the BDL. It is also easy to implement a BDL over a physical channel with 
either bounded storage or bounded delay using the techniques described in [AB89] and 
[Spi88a]. We prefer to use a UDL for the rest of this thesis as it provides a simple 
and elegant interface. However, the reader concerned about efficiency should be aware 
that all the protocols in this thesis can be modified (slightly) to work with BDLs. 

Finally, there is one last concern about UDLs. We have seen that real implemen- 
tations will use Data Links that stabilize to a UDL. However, in our model every link 
is assumed to actually be a UDL. Let S be a network in which each link stabilizes to 
the behaviors of a UDL and such that every node automaton is a UIOA. Let S be the 
same network except that every link is replaced by a UDL. Now a UDL is a UIOA and 
hence is suffix- closed. Thus a nice consequence of Theorem 3.5.7 is that if S stabilizes 
to the behaviors in some problem P, then so does S. Thus, to prove a stabilization 
result about S it suffices to prove the stabilization result about S. This is an example 
of why the modularity theorem (Theorem 3.5.7) is important. 

5.3 Locality 

In this section, we reconsider the definitions of local checkability that we introduced in 
Chapter 4. Our new definitions will be slightly more complex because of the presence 
of channels between nodes. These new definitions will be used for the rest of the thesis. 
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5.3.1 Link Subsystems and Local Predicates 

Consider a network automaton with graph G. Roughly speaking, a property is said 
to be local to a subgraph G' of G if the truth of the property can be ascertained 
by examining only the components specified by G' . For now we will concentrate on 
link subsystems that consist of a pair of neighboring nodes u and v and the channels 
between them. In Chapter 10, we will discuss how our methods can be generalized to 
arbitrary subsystems. 

In the following definitions, we fix a network automaton M = Net(G,N). 

Definition 5.3.1 We define the (u,v) link subsystem of M as the composition of N u , 

*-^u,v) *-^v,u) ana i\ v . 

For any state s of M: s\u denotes s projected on to node N u and s\(u,v) denotes 
s projected onto C UiV . Thus when M is in state s, the state of the (u,v) subsystem is 
the 4-tuple: (s\u,s\(u,v),s\(v,u),s\v). 

A predicate L of M is a subset of the states of M . Let (u,v) be some edge in graph 
G of M . A local predicate L u>v of M for edge (u,v) is a subset of the states of the (u,v) 
subsystem in M . We use the word "local" because L u>v is defined in terms of the (u,v) 
subsystem. 

The following definition provides a useful abbreviation. It describes what it means 
for a local property to hold in a state s of the entire automaton. 

Definition 5.3.2 We say that a state s of M satisfies a local predicate L u>v of M iff 
(s\u,s\(u,v),s\(v,u),s\v) G L u>v . 

We will make frequent use of the concept of a closed predicate. Intuitively, a 
property is closed if it remains true once it becomes true. In terms of local predicates: 

Definition 5.3.3 A local predicate L u>v of network automaton M is closed if for all 
transitions (s,7r,.s) of N ', if s satisfies L u>v then so does s. 

The following definitions provide two more useful abbreviations. The first gives a 
name to a collection of local predicates, one for each edge in the graph. The second, 
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the conjunction of a collection of "local properties", is the property that is true when 
all local properties hold at the same time. As in Chapter 4, we will require that the 
conjunction of the local properties is non-trivial - i.e., there is some global state that 
satisfies all the local properties. 

Definition 5.3.4 £ is a link predicate set for J\f = Net(G,N) if for each (u,v) £ G 
there is some L u>v such that: 

• If (a,b,c,d) £ L u>v then (d, c,fe, a) £ L v>u . (i.e., L u>v and L v>u are identical except 
for the way the states are written down.) 

• £ = {L UtV ,(u,v) £ G} 

• There is at least one state s of M such that s satisfies L u>v for all L u>v £ £. 

Definition 5.3.5 The conjunction of a link predicate set £ is the predicate {s : s 
satisfies L u>v for all L u>v £ £}. We use Conj(C) to denote the conjunction of C 

Note that Conj(C) cannot be the null set by the definition of a link predicate set. 

5.3.2 Local Checkability 

Suppose we wish a network automaton M to satisfy some property. An example would 
be the property "all nodes have the same color". We can often specify a property of 
M formally using a predicate L of N '. Intuitively, M can be locally checked for L if we 
can ascertain whether L holds by checking all link subsystems of N '. The motivation 
for introducing this notion is performance: in a distributed system we can check all 
link subsystems in parallel in constant time. We formalize the intuitive notion of a 
locally checkable property as follows. 

Definition 5.3.6 A network automaton M is locally checkable for predicate L using 
link predicate set £ if: 

• £ is a link predicate set for M and L I) Conj(£). 

• Each L u>v £ £ is closed. 

101 



The first item in the definition requires that L holds if a collection of local properties 
all hold. The second item is perhaps more surprising. It requires that each local 
property also be closed. 

We add this extra requirement because in an asynchronous distributed system it 
appears to be impossible to check whether an arbitrary local predicate holds all the 
time. What we can do is to "sample" the local subsystem periodically to see whether 
the local property holds. Suppose the network automaton consists of three nodes u, v 
and w and such that v is the neighbor of both u and w. Suppose the property L that 
we wish to check is the conjunction of two local predicates L u>v and L ViW . Suppose 
further that exactly one of the two predicates is always false, and the predicate that is 
false is constantly changing. Then whenever we "check" the (u,v) subsystem we might 
find L u>v true. Similarly whenever we "check" the (v,w) subsystem we might find L v>w 
true. Then we may never detect the fact that L does not hold in this execution. We 
avoid this problem by requiring that L u>v and L v>w be closed. 

5.3.3 Local Correctability 

The motivation behind local checking was to efficiently ensure that some property L 
holds for network automaton M . We would also like to efficiently correct M to make 
the property true. We have already set up some plausible conditions for local checking. 
Can we find some plausible conditions under which M can be locally corrected?. 

To this end we define a local reset function /. This is a function with three 
arguments: the first argument is a node say u, the second argument is any state of 
node automaton N u , and the second argument is a neighbor v of u. The function 
produces a state of the node automaton corresponding to the first argument. Let s 
be a state of A/"; recall that s\u is the state of N u . Then f(u, s\u,v) is the state of N u 
obtained by applying the local reset function at u with respect to neighbor v. We will 
abuse notation by omitting the first argument when it is clear what the first argument 
is. Thus we prefer to write f(s\u,v) instead of the more cumbersome f(u, s\u,v). 

We will insist that / meet two requirements so that / can be used for local correc- 
tion (Definition 5.3.7). 

Assume that the property L holds if a local property L u>v holds for every edge 
(u,v). The first requirement is that if any (u,v) subsystem does not satisfy L u>v , then 
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applying / to both u and v should result in making L u>v hold. More precisely, let us 
assume that by some magic we have the ability to simultaneously: 

• Apply / to N u with respect to v; 

• Apply / to N v with respect to u; 

• Remove any packets stored in channels C UiV and C ViU . 

Then the resulting state of the (u,v) subsystem should satisfy L u>v . Of course, in a 
real distributed system such simultaneous actions are clearly impossible. However, we 
will achieve essentially the same effect by applying a so-called "reset" protocol to the 
(u,v) subsystem. We will describe a stabilizing local reset protocol for this purpose in 
the next section. 

The first requirement allows nodes u and v to correct the (u,v) subsystem if L u>v 
does not hold. But other subsystems may be correcting at the same time! Since 
subsystems overlap, correction of one subsystem may invalidate the correctness of an 
overlapping subsystem. For example, the (u,v) and (v,w) subsystems overlap at v. If 
correcting the (u,v) subsystem causes the (v,w) subsystem to be incorrect, then the 
correction process can "thrash". To prevent thrashing, we add a second requirement. 
In its simplest form, we might require that correction of the (u,v) subsystem leaves 
the (v,w) subsystem correct if the (v,w) subsystem was correct in the first place. 

However, there is a more general definition of a reset function / that turns out to 
be useful. Recall that we wanted to avoid thrashing that could be caused if correcting 
a subsystem causes an adjacent subsystem to be incorrect. Informally, let us say 
that the (u,v) subsystem depends on the (v,w) subsystem if correcting the (v,w) 
subsystem can invalidate the (u,v) subsystem. If this dependency relation is cyclic, 
then thrashing can occur. On the other hand if the dependency relation is acyclic then 
the correction process will eventually stabilize. Such an acyclic dependency relation 
can be formalized using a partial order < on unordered pairs of nodes: informally, the 
(u,v) subsystem depends on the (v,w) subsystem if {v,w} < {u,v}. 

Using this notion of a partial order, we present the formal definition of a local reset 
function: 
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Definition 5.3.7 We say f is a local reset function for network automaton M = 
Net(G,N) with respect to link predicate set £ = {L u>v } and partial order <, if for any 
state s of M and any edge (u,v) of G: 

• Correction: (f(s\u,v), nil, nil, f(s\v,u)) £ L UjV . 

• Stability: For any neighbor w of v, 

If (s\u, s\(u,v), s\(v,u), s\v) (E L u>v and {v,w} ^t {u,v} then 
(s\u,s\(u,v),s\(v,u),f(s\v,w)) G L u>v . 

Notice that in the special case where all the link subsystems are independent, no 
edge is "less" than any other edge in the partial order. 

Using the definition of a reset function, we can define what it means to be locally 
correctable. 

Definition 5.3.8 A network automaton M is locally correctable to L using link pred- 
icate set C, local reset function f, and partial order < if: 

• M is locally checkable for L using C 

• f is a local reset function for M with respect to £ and <. 

Intuitively, if we have a reset function / with partial order < we can expect the 
local correction to stabilize in time proportional to the maximum chain length in the 
partial order. Recall that a chain is a sequence a^ < a 2 < a 3 . . . < a n . Thus the 
following piece of notation is useful. 

Definition 5.3.9 For any partial order <, height(<) is the length of the maximum 
length chain in <. 

5.4 Local Correction Theorem 

5.4.1 Overview 

In the previous section, we set up plausible conditions under which a network automa- 
ton can be locally corrected to achieve a property L. We claimed that these conditions 
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could be exploited to yield local correction. In this section we make these claims 
precise. We show how to take a network automaton M that can be locally corrected 
to L, and transform it into a new automaton A/" + . The new automaton A/" + has the 
property that in all its executions, property L holds after a bounded amount of time. 
More precisely, A/" + stabilizes to the behaviors of M\L in a bounded amount of time. 
The next subsection contains a formal statement of this result. 

To transform M into A/" + we will add actions and states to N '. These actions will 
be used to send and receive snapshot packets (that will be used to do local checking 
on each link subsystem) and reset packets (that will be used to do local correction on 
each link subsystem). For every link (u,v), the leader l(u,v) initiates the checking and 
correction. 2 

5.4.2 Precise Statement of the Result 

To state the result formally, we need the following definitions. First, when we augment 
a network automaton the resulting automaton should have the same topology and 
also be an unrestricted automaton (UIOA) that can start in any state. The topology 
restriction rules out trivial "centralized" solutions. We also require that the links 
remain UDLs. To formalize these requirements, we define a new type of automaton. 

Definition 5.4.1 Let G = (V,E,l) be a topology graph. An automaton for graph 
G is the composition of an automaton for each u £ V , together with C UiV for each 
(u,v) £ E. We assume that all automata being composed are compatible. 

Notice that any network automaton for graph G is also an automaton for graph 
G. However, an automaton for graph G need not be a network automaton because a 
network automaton has additional constraints (such as having outbound queues and 
free variables for each link) on the node automata. 

A UIOA for graph G is an automaton for graph G that is also a UIOA. Recall 
that we used A\L to denote the automaton identical to A except that its start states 
belong to set L. The following piece of shorthand is useful for a concise statement of 
the theorem. 



2 Without this assumption, we have to complicate the code and proof to deal with simultaneous 
checking and correction actions by both ends of a link. This can actually be done, thereby getting rid 
of the requirement for a leader on links. But it isn't worth the increased complexity. 
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Definition 5.4.2 Let M denote a network automaton. We will use M{f) to denote 
the automaton that is identical to M except that the link and node delays in M{f) are 
equal to t. 

Now (finally!) we can state our theorem. Intuitively, it states that if M is locally 
correctable to L using local reset function / and partial order <, then we can transform 
M into A/" + such that A/" + satisfies the following property: in time proportional to 
height(<), every behavior of A/" + will "look like" a behavior of M in which L holds and 
in which the node and link delays are increased by some constant factor. 

Theorem 5.4.3 Local Correction: Consider any network automaton J\f = Net(G,N) 
that is locally correctable to L using link predicate set C, local reset function f, and 
partial order <. Then there exists some M + that is a UIOA for graph G and constants 
c and c such that M + stabilizes to the behaviors of Af(c)\L in time c • height(<). 

5.4.3 Overview of the Transformation Code 

For those familiar with snapshot protocols, the structure of our local snapshot protocol 
is slightly different from the well-known Chandy-Lamport snapshot protocol [CL85]. It 
is easy to show that the Chandy-Lamport scheme cannot be used without modifications 
over unit storage links. Briefly, the reason is as follows. The correctness proof of 
the algorithm in [CL85] is based on reordering executions while preserving causality 
constraints. The only causality constraint for a link in [CL85] is that any action 
that sends a packet p on link L cannot be reordered to come after an action that 
receives p on link L. However, a UDL has an additional causality constraint. A free 
signal delivered by link L after delivering packet p cannot be reordered to come before 
the action that delivers packet p. The Chandy-Lamport scheme was not designed to 
incorporate this extra causality constraint. As a result, if the Chandy-Lamport scheme 
is used unmodified over a network with UDLs, the snapshot may (incorrectly) return 
a state in which there is more than one packet on a UDL! 

Our local snapshot/reset protocol works roughly as follows. Consider a (u,v) sub- 
system. Assume that l(u,v) = u - i.e., u is the leader on link (u,v). A single snapshot 
or reset phase has the structure shown in Fig 5.4. 
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Figure 5.4: The structure of a single phase of the local snapshot/reset protocol 

A single phase of either a snapshot or reset procedure consists of u sending a request 
that is received by v, followed by v sending a response that is received by u. During 
a phase, node u sets a flag (phase u [v]) to indicate that it is checking/correcting the 
(u,v) subsystem. While this flag is set, no packets other than request packets can be 
sent on link C UiV . Since a phase completes in constant time, this does not delay the 
data packets by more than a constant factor. 

In what follows, we will use the basic state at a node u to mean the part of the state 
at u "corresponding" to automaton N u . To do a snapshot, node u sends a snapshot 
request to v. A snapshot request is identified by a mode variable in the request packet 
that carries a mode of snapshot. If v receives a request with a mode of snapshot, Node 
v then records its basic state (say s) and sends s in a response to u. 

When u receives the response, it records its basic state (say r). Node u then records 
the state of the (u,v) subsystem as x = (r, nil, nil, s). If x (£ L u>v (i.e., local property 
L u>v does not hold) then u initiates a reset. 

To do a reset, node u sends a reset request to v. A reset request is identified by 
a mode variable in the request packet that carries a mode of reset. Recall that / 
denotes the local reset function. After v receives the request, v changes its basic state 
to f{y,s,u), where s is the previous value of -y's basic state. Node v then sends a 
response to u. When u receives the response, u changes its basic state to f(u,r,v), 
where r is the previous value of w's basic state. 

Of course, the local snapshot and reset protocol must also be stabilizing. However, 
the protocol we just described informally may fail if requests and responses are not 
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properly matched. This can happen, for instance, if there are spurious packets in the 
initial state of A/" + . To make the snapshot and reset protocols stabilizing, we number 
all request and response packets. Thus each request and response packet carries a 
number count. Also, the leader u keeps a variable count u [v] that u uses to number all 
requests sent to v within a phase. At the end of the phase, u increments count u [v]. 
Similarly, the responder v keeps a variable count v [u] in which v stores the number of 
the last request it has received from u. Node v weeds out duplicates by only accepting 
requests whose number is not equal to count u [v]. 

Clearly the count values can be arbitrary in the initial state and the first few phases 
may not work correctly. However, numbering and a few easy checks ensure that in 
constant time a response will be properly matched to the correct request. Because the 
links are unit storage, we will see that a space of 4 numbers is sufficient. Our use of 
numbering is taken from the stabilizing global snapshot protocol of [KP90]. However, 
our protocol is simpler and more efficient because we are restricted to a single link 
subsystem. 

Besides properly matching requests and responses, we must also avoid deadlock 
when the local snapshot/reset protocol begins in an arbitrary state. To do so, when 
phase u [v] is true (i.e., u is in the middle of a phase), u continuously sends requests. 
Since v weeds out duplicates this does no harm and also prevents deadlock. Similarly, 
v continuously sends responses to the last request v has received. Once the responses 
begin to be properly matched to requests, this does no harm, because u discards such 
duplicate responses. 

An irritating issue that we have to deal with in creating A/" + is the issue of scheduling 
packets to be sent on links. Notice that the checking and correction protocols are going 
on concurrently with the protocol corresponding to N '. To make Theorem 5.4.3 work, 
we need to ensure that any data packets that are placed on the queue for channel C UiV 
are sent in constant time. On the other hand, the checking process also needs to send 
request and response packets that are encoded as control packets. 

We build a simple stabilizing scheduler that ensures fair access to the link for each 
of three packet types: requests, responses and data packets. First we notice that at 
the leader end of a link, only requests and data packets need to be sent. At the other 
end, only responses and data packets need to be sent. 

Consider the leader end of a link first. Suppose l(u,v) = u. We know that data 
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packets should never be sent while a snapshot or reset phase is in progress. Now, we 
have a variable phase u [v] at u which is set to true whenever a phase is in progress. 
The phase ends when a matching response is received and phase u [v] is set to false. 
At the end of a phase, the oldest data packet is sent, and a new phase is begun after 
the data packet is sent. The scheduler is stabilizing because if there is no data packet 
waiting to be sent, we allow a new phase to begin immediately after the previous phase 
ends. The net effect is that if there are data packets waiting, we send one data packet 
between consecutive checking/correction phases. If a problem is discovered during a 
snapshot phase, we do not do a reset until the next phase, after sending any waiting 
data packet. 

Now consider the end of a link that is not the leader. Suppose l(u,v) = v. To give 
"fair turns" to the response packets we use a variable turnj^v] that has only two values 
data (for data packets), and response (which is the value of turn for response packets). 
No packet can be sent until either its turn arrives or there is no packet of the other 
type. After a packet of a particular type is sent, the turn is "toggled" to the other 
type. 

5.4.4 Constructing Augmented Automata: Formal Descrip- 
tion 

To transform M into A/" + we will show how to transform each node automaton N u in 
M into a new, augmented node automaton N+ . Finally, A/" + is the composition of the 
new node automata and the (unchanged) channel automata. 

Assume that network automaton M = Net(G,N) can be locally corrected to L 
using link predicate set £ = {L u>v } and local reset function /. We create M + = 
Augment(M ', £, /) by adding states and actions to each node automaton N u as follows: 

• We add the following new variables and their domain specifications, to the state 
of N u , one for each neighbor v of u as shown below:. 

— count u [v] £ {0 ... 3} (used to number request and response packets to ensure 
proper matching. The magic number 3 arises because a link subsystem can 
store at most three distinct counter values on the sending link, receiving 
link, and at the receiver node.) 
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• 



• 



— mode u [v] £ {reset, snapshot} (used to keep track of whether the current 
phase is a snapshot or reset phase.) 

— phase u [v] is a Boolean (set to true during a reset or snapshot phase. It is 
used by the leader node to inhibit data packets from being sent during a 
phase.) 

— freeq u [v] is a Boolean (set to true after any packet is sent and set to false 
after any packet is removed from the link. The original variable free u [v] 
will be set to true after a data packet is sent and set to false after a data 
packet is removed from the link. We could have optimized by using just one 
free variable but keeping two variables makes the projection and the proof 
easier.) 

— turnj^v] £ {response, data} (used to keep track of whether a data packet or 
a response packet has the next turn to be sent.) 

We add two new packet types to A/" + . Recall that there are two basic types of 
packets, data packets and control packets. We will encode request and response 
packets as control packets. We will use the symbols pdata, Vreq, and p re s P to denote 
data, request and response packets respectively. The format of a data packet is 
defined by automaton M . The encoding of the other two packets is: 

Request : (Control, Request, count, mode), where count is an integer from ... 3, 
and mode is either reset or snapshot. 

Response : (Control, Response, count, mode, nodestate), where count is an in- 
teger from ... 3, mode is either reset or snapshot, and node_state is a state 
of a node automaton N u in N ' . 

As usual, we use record notation to extract fields of a packet. Thus p req . count is 
the count field in a request packet. 

We modify the SEND U] „(p) (for data packets p) and FREE U] „ actions of the original 
automaton N u as shown in Figure 5.5. This code contains modifications for both 
the leader and responder ends in one piece of code; however, some parts of the 
code applies only to leaders and some parts only apply to responders. 

If node u is the leader on link (u,v), then we enable sending data packets only 
when phase u [v] = false indicates a phase is not in progress. After the data packet 
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SEND U] „(p) (*output action for p £ Pdata only*) 








Preconditions: 








freeq u [v] := true and free u [v] := true 








p is head of queue u [v] 








((l(u,v) = u) and (phase u [v] = false)) OR ((l(u 


,v) = v) and turriu [v] = 


data)) 




Effect: 








freeq u [v] := /aZse and free u [v] := /aZse; 








Remove p from head of queue u [v] 








turriu [v] = response (* give response 


packets a turn; only affects responders*) 


phase u [v] = true (* start a new checkin 


^/correction phase; only 


affects 


leaders*) 


Fr.ee U] „ (*input action*) 








Effect: freeq u [v] := true and free u [v] := true; 









Figure 5.5: Code for the modified SEND mj „(p) actions at a modified node JV+. 



is sent, we set phase u [v] = true. If node u is not the leader on link (u,v), then we 
enable sending data packets only when turnj^v] = data. Also immediately after 
sending a data packet, we set turnj^v] = response, which allows the response 
packets to get a fair turn. We use the freeq u [v] variable to keep track of whether 
there is some packet (either data or control) on C UiV while free u [v] keeps track of 
whether there is a data packet on C UiV . The code appears to have some redundant 
checks (for example, the code checks both free variables before sending a data 
packet) but these extra checks do make the proof easier. 

• We add the actions shown in Figures 5.6 and 5.7 to N u for each neighbor v of u. 
These actions only apply to control packets. We use the following notation. Let 
s denote the current state of M + when an action is performed. Let s\u denote 
the current state of N£ projected onto the original automaton N u . In order to 
project s to s\u we do the following. All variables of N u take the same values as 
the corresponding variables in N+ . 
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SEND U] „(p req ) (*output action: u repeatedly sends a request till it gets a response*) 


Precondition: 






l(u,v) = u 




(*u is the leader of link subsystem*) 


(phase u [v] = true) or (queue u 


[v] 


is empty) (*phase in progress or no data packets waiting 1 ^ 


freeq u [v] = true 




(* no packet in transit on link to v *) 


p req . count = countu [v] ; 




(*count in packet is count of phase*) 


p req .mode = mode u [v]; 




(*mode in packet is mode of phase*) 


Effect: 






freeq u [v] = false 




(* set to false until link says it is free*) 


phase u [v] := true; 




(*remains true until matching response returns*) 


RECEIVE„ ]U (p req ) 




(*input action, receive request at u from v*) 


Effect: 






If Preq ■ count ^ country] and 


l(u 


: v) = v then (* not a duplicate or invalid packet*) 


countu [v] := p req . count; 




(*remember count*) 


mode u [v] := p req .mode; 




(*remember mode*) 



Figure 5.6: Code to send and receive request packets at node u. 



When we apply / to the projected state by setting s\u to f(s\u,v) we affect 
only the projected variables. Thus, for instance, the value of freeq u [v] remains 
unchanged. 

• We add two extra classes for every neighbor v of u to N u . Each output action of 
the form SEND U] „ ( Control, Request, *) is added to a new partition class. Similarly, 
each output action of the form SEND U] „( Control, Response, *) is added to a new 
partition class. The time associated with all new classes is still the node delay 

• We hide all actions of M + that are not actions of N ' . 



112 



SEND U] „(p resp ) (*output action: u 


repeatedly sends a response to last request*) 


Precondition: 




l(u,v) = V 


(*u is not the leader of link subsystem*) 


(turn = response) or (queue u [v] is empty) ( 


Response's turn or no data packets waiting*) 


freeq u [v] = true 


(* no packet in transit on link to v *) 


p reS p. count = county [v] ; 




If mode u [v] = snapshot then p resp .nodestate = s\u else p resp .nodestate := f(s\u,v) 


Effect: 




If mode u [v] = reset then s\u := f(s \u, v) 


(*reset node u's state locally*) 


mode u [v] := snapshot 


(* return to default mode of snapshot*) 


turn u [v] := data 


(*give data packets a turn*) 


freeq u [v] := false 


(* set to false until link says its free*) 


RECElVE„ ]U (p resp ) (*input action to receive response at u from v*) 


Effect: 




If ( county [v] = p resp . count) and (phase u [v] 


= true) and (l(u, v) = u) then 


If mode u [v] = snapshot then 




If (s\u, nil, nil, p resp .nodestate) ^ L UjV 


then mode u [v] := reset 


Else if mode u [v] = reset then s\u = f(s\u,v) (*reset node u's state locally*) 


phase u [v] := false; 


(*end of phase*) 


countu [v] := ( county [v] + 1) mod 4; 





Figure 5.7: Code to send and receive response packets at node u. 
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5.5 Intuitive Proof of Local Correction Theorem 

In the previous section we described how to transform a locally correctable automaton 
M into an augmented automaton A/" + We have to prove that in time proportional to 
the height of the partial order every behavior of A/" + (described in the last section) is 
a behavior of M in which all local predicates hold. 

The basic intuition behind the proof is sketched in Figure 5.8 and Figure 5.9. Con- 
sider some (u,v) subsystem in which u is the leader. We describe the intuition behind 
the use of a counter to ensure proper request-response matching, and the intuition 
behind the local snapshot and reset procedures. 

5.5.1 Intuition Behind Counter Based Matching 

Recall that in the augmented automaton, both local snapshots and local responses are 
implemented using a request-response protocol. As we will see below, both the local 
snapshot and reset procedures will only work correctly if the response from v is sent 
following the receipt of the request at u. The diagram on the left of Figure 5.8 shows 
a scenario in which requests are matched incorrectly to "old" responses that were sent 
in previous phases. Thus we need each phase to eventually follow the structure shown 
in the right of Figure 5.8. 

The code of the augmented automaton ensures that correct matching will occur 
after at most 5 phases by numbering requests and responses. Thus the code uses a 
counter in the range 0..3. The significance of the number 3 will be seen below. The 
sender u keeps a counter count u [v] to number requests and the receiver v keeps a 
counter count v [u] to number responses. Node v accepts a new request numbered c 
only if c ^ count v [u\. When v accepts this request, v also sets count v [u] to be equal 
to c. Finally, node u accepts a response numbered c only if c = count u [v]. When u 
accepts this response, u also increments count u [v] mod 4. This is implemented in the 
code and illustrated in the diagram on the right of Figure 5.8. 

In the first two phases, packets sent by u and v may be dropped by the links because 
the links may have packets stored in the initial state. However, it is easy to see from 
the properties of a UDL, that after at most two phases, the links in both directions are 
drop-free - i.e., any packets sent from that point on will be delivered. Thus we need 
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Figure 5.8: Using counter flushing to ensure that request-response matching will work correctly within a 
small number of phases. 

to show that within the next three phases, the request-response matching will begin 
to work correctly. This follows from a simple paradigm that we call counter flushing. 

Counter flushing is a general technique that is quite versatile. Chapter 10 describes 
some more applications of counter flushing. In our case, the counter flushing argument 
runs as follows: 

• There can be at most 3 counter values stored in the two links (i.e., the link from 
u to v and the link from v to u) and the receiver. This is the significance of the 
number 3. 

• The sender retransmits till it gets a response and so the sender counter will keep 
being incremented. 

• Within 3 increments of the sender counter, the sender counter will reach a "fresh" 
counter value that is not present in the links and the receiver. This is because 
the counter space has 4 values, and new counter values can only be created by 
the sender. 

• Suppose the sender sends a request numbered c where c is a fresh value that is not 
present in the receiver or on the two links. Then when a later response numbered 
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c is received, this response is a matching response because no aliasing is possible. 
Also, by the time the sender receives the matching response, the only counter 
value stored in the two links and the receiver is c. In other words, a freshly 
numbered request and its matching response will "flush" the (u,v) subsystem of 
outdated counter values. Hence the term counter flushing. 

• After all old counter values have been flushed, we say that the (u,v) subsystem 
is clean. It is easy to show that all subsequent phases follow the structure shown 
in the diagram on the right of Figure 5.8. 

5.5.2 Intuition Behind Local Snapshots 

The diagram on the left of Figure 5.9 shows why a snapshot works correctly if the 
response from v is sent following the receipt of the request at u. Let a' and b be the 
state of nodes u and v respectively just before the response is sent. Let a and b' be 
the state of nodes u and v respectively just after the response is delivered. This is 
sketched in Figure 5.9. 

From the code we know that node u does not send any data packets to v during 
a phase. Also v cannot send another data packet to u from the time the response is 
sent until the response is delivered. This is because the link from v to u is a UDL that 
will not give a free indication until the response is received. Recall that nil denotes 
the absence of any packet on a link. Thus the state of the (u,v) subsystem just before 
the response is sent is (a' , nil, nil, b). Similarly, the state of the (u,v) subsystem just 
after the response is delivered is (a, nil, nil, b'). 

We claim that it is possible to construct some other execution of the [u, v) subsystem 
which starts in state (a', nil, nil, b), has an intermediate state equal to (a, nil, nil, b) and 
has a final state equal to (a,nil,nil,b'). This is because we could have first applied 
all the actions that changed the state of node u from a' to a, which would cause the 
(u,v) subsystem to reach the intermediate state. Next, we could apply all the actions 
that changed the state of node v from b to b' , which will cause the (u,v) subsystem to 
reach the final state. Note that this construction is only possible because u and v do 
not send data packets to each other between the time the response is sent and until 
the time the response is delivered. 

Thus the state (a, nil, nil, b) recorded by the snapshot is a possible successor of the 

116 



u 



O 



V 



O 



u 

O 



V 

O 



> nap shot Request 




Correct Snapshot Phase 



b' 



TIME 



eset Request 




Correct Reset Phase 



Figure 5.9: Local Snapshots and Resets work correctly if requests and responses are properly matched. 

state of (u,v) subsystem when the response is sent. The recorded state is also a a 
possible predecessor of the state of (u,v) subsystem when the response is delivered. 
But L u>v is a closed predicate - it remains true once it is true. Thus if L u>v was true 
just before the response was sent, then the state recorded by the snapshot must also 
satisfy L u>v . Similarly, if L u>v is false just after the response is delivered, then the 
state recorded by the snapshot does not satisfy L u>v . Thus the snapshot detection 
mechanism will not produce false alarms if the local predicate holds at the start of the 
phase. Also the snapshot mechanism will detect a violation if the the local predicate 
does not hold at the end of the phase. 

5.5.3 Intuition Behind Local Resets 

The diagram on the right of Figure 5.9 shows why a local reset works correctly if the 
response from v is sent following the receipt of the request at u. Let b be the state 
of node v just before the response is sent. Let a and b' be the state of nodes u and v 
respectively just before the response is delivered. This is sketched in Figure 5.9. 

The code for an augmented automaton will ensure that just after the response 
is sent, node v will locally reset its state to f(b,u). Similarly, immediately after 
it receives the response, node u will locally reset its state to f(a,v). Using similar 
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arguments to the ones used for a snapshot, we can show that there is some execution 
of the (u,v) subsystem which begins in the state (f(a,v),nil,nil,f(b,u)) and ends in 
the state (f(a,v),nil,nil,b'). But the latter state is the state of the (u,v) subsystem 
immediately after the response is delivered. But we know, from the correction property 
of a local reset function, that (f(a,v),nil,nil,f(b,u)) satisfies L u>v . Since L u>v is a closed 
predicate, we conclude that L u>v holds at the end of the reset phase. 

5.5.4 Intuition Behind Local Correction Theorem 

We can now see intuitively why the augmented automaton will ensure that all local 
predicates hold in time proportional to the height of the partial order. Consider a 
(u,v) subsystem where {u,v} -ft {w,x} for any pair of neighbors w,x - i.e., {u,v} is 
a minimal element in the partial order. Then, within 5 phases of the (u,v) subsystem 
the request-response matching will begin to work correctly. If the sixth phase of the 
(u,v) subsystem is a snapshot phase, then either L u>v will hold at the end of the phase 
or the snapshot will detect a violation. But in the latter case, the seventh phase will 
be a reset phase which will cause L u>v to hold at the end of the seventh phase. 

But once L u>v remains true, it remains true. This is because L u>v is a closed 
predicate of the original automaton M and the only extra actions we have added to A/" + 
that can affect L u>v are actions to locally reset a node using the reset function /. But by 
the stability property of a local reset function, any applications of / at u with respect 
to some neighbor other than v cannot affect L u>v . Similarly, any applications of / at v 
with respect to some neighbor other than u cannot affect L u>v . Thus in constant time, 
the local predicates - corresponding to link subsystems that are minimal elements in 
the partial order - will become and remain true. 

Now suppose that the local predicates for all subsystems with height < i hold from 
some state S{ onward. By similar arguments, we can show that in constant time after 
S{, the local predicates for all subsystems with height i + 1 become and remain true. 
Once again, the argument depends crucially on the stability property of a local reset 
function. The intuition is that applications of the local reset function to subsystems 
with height < i do not occur after state S{. But these are the only actions that can 
falsify the local predicates for subsystems with height i + 1. The net result is that all 
local predicates become and remain true within time proportional to the height of the 
partial order <. 
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5.6 Formal Proof of Local Correction Theorem 

The reader who is satisfied with the "intuitive proof" given above should skip this 
section. However, there are a number of details glossed over in the intuitive proof that 
are spelled out in the formal proof. The formal proof also provides a good example 
of the proof techniques of Chapter 3. Formal proofs for other major theorems in 
this thesis like the Tree Correction theorem (Chapter 6) and the Global Correction 
Theorem (Chapter 8) can be constructed along similar lines. 

5.6.1 Overview of Formal Proof 

We wish to prove the local correction theorem, Theorem 5.4.3. Thus we have to prove 
that in time proportional to height(<), every behavior of A/" + will "look like" a behavior 
of M in which L holds and in which the node and link delays are increased by some 
constant factor. 

The formal proof is based on the proof technique described in Chapter 3 in Lemma 3.4.2. 
Thus the proof consists of two major parts: 



• 



• 



We first define a predicate Q (more details below) and show that any execution 
of A/" 4 " stabilizes to the executions of M + \Q. We prove this using the Execution 
Convergence theorem, Theorem 3.4.5. 

We show that any behavior of M + \Q is also a behavior of Af{c)\L, for some 
constant c. We prove this using the Refinement Mapping theorem, Theorem 
3.4.3 

We now present a more detailed roadmap of each part of the formal proof. 

The first part of the proof is described in Sections 5.6.2 to 5.6.4. To show that A/" + 
stabilizes to the executions of M + \Q we use another two step process: 

• In Section 5.6.3 we formally define the concept of a clean link alluded to earlier. 
Intuitively, a link is clean if the snapshot numbering scheme is working correctly, 
meaning that requests and responses will properly be matched. We use the 
predicate C to denote the fact that all links are clean. At the end of Section 

119 



5.6.3, in Theorem 5.6.14 we prove that A/" + stabilizes to the executions of Af + \C 
in constant time. The proof is based on the counter flushing intuition we have 
described earlier. 

• In Section 5.6.4 we define another important concept called a quiet link. Recall 
that our intent is to make the local predicate L u>v hold for every link. Intuitively, 
a link (u,v) is quiet if two properties hold. First, L u>v holds. Second, if all links 
less than (u,v) in the partial order are also quiet, then there will be no more 
reset actions on that link. We use the predicate Q to denote the fact that all 
links are quiet. At the end of Section 5.6.4, in Theorem 5.6.22 we prove that 
M + \C stabilizes to the executions of M + \Q in time proportional to the height of 
the partial order. 

The second part of proof (Section 5.6.5, (Lemma 5.6.23)) shows that any behavior 
of M + \Q is also a behavior of Af{c)\L, for some constant c. This is done using the 
Refinement Mapping theorem, Theorem 3.4.3. In order to use this theorem we must 
derive a projected state of M from a state of A/" + . We have already seen how to derive 
a projected state s\u of a node N u from a state s of N+ . To complete our job, we need 
to state how to project the state of the channels in A/" + to N ' . 

Consider the problem of projecting the channel state. Intuitively, in A/" + , the 
original automaton M is "sharing" each channel with request and response packets. 
Let's look at the behavior of A/" + . When a control packet is on the channel C U] „, we can 
pretend that, as far M goes, the channel is really empty. There are two consequences 
of this. First, if the projected N u has a data packet to send, it can take longer before 
the data packet is sent. Second, it can take longer before a free signal is delivered 
by the link. This can happen if there is a control packet on the link. We model this 
by saying that in the projected behavior the node and link delays are increased by a 
constant factor. This is the source of the increased link and node delays in Theorem 
5.4.3. 

Definition 5.6.1 For any state s of M + , we define Proj(s), the state of M + projected 
onto M , as follows: 

• For any two neighboring nodes u and v: if s.Q UiV £ Pdata then Proj(s).Q UiV = 
s-Qu,vi else Proj{s).Q u ,v = nil- (i-e., if there is a data packet p in channel C UiV 
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in the original state, then p is also present in the projected state; if not, C UiV is 
considered empty in the projected state.) 

• All other variables of M have the same values in state Proj(s) as in state s. 

We complete the second part of the proof (Lemma 5.6.23) using the Refinement 
Mapping theorem and using the mapping function Proj . Notice that is not as simple 
as it might appear. If all actions of A/" + that change the projected state are actions 
of A/", then it would be quite easy. Unfortunately we have a complication. There are 
actions of A/" + that can cause a local reset. An example is the receipt of a p re s P packet 
with p resp .mode = reset. Such actions change the projected state but are not actions 
of A/". However, we will show that such actions cannot appear in executions of M + \Q. 
This is because Q is a strong enough predicate to ensure not only that property L 
holds in the projected state, but also that no local reset actions are enabled. 

We continue to let s\u denote the current state of A/"J~ projected onto the original 
automaton N u . We also use s\(u,v) to denote the state of C UiV projected onto the 
original automaton N. In other words s\(u,v) = Proj(s).Q UiV . 

5.6.2 Phases 

We formally define a phase (see Figure 5.4) on a link, as an interval during which 
the checking/correction procedure works on that link. As we have seen, the check- 
ing/correction procedure works by setting phase u [v] = true, and attempting to send 
out a numbered request packet. If a response is received with a matching number then 
phase u [v] is set to false. We make this more precise below. 

Definition 5.6.2 A (u,v) phase for execution a is any interval 7 of a such that 

• u = l(u,v) 

• 7 begins with a state S{ of a such that S{.phase u [v] = false and S{ + i.phase u [v] = 
true. 

• 7 ends with the first state Sj,j > i of a in which Sj.phase u [v] = false. 
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Notice that a (u,v) phase is only defined for a leader edge (u,v). 

The definition of a phase allows the last state of a phase to overlap the first state 
of the next phase. Clearly we can divide an execution a into consecutive (u,v) phases 
as well as intervals that lie between (u,v) phases. Thus we can speak of the i-th (u,v) 
phase in a in this division. 

Our first lemma states that in between phases, at most one data packet is sent on 
C U] „. Thus we can think of an execution (from the point of view of any link C U] „) as 
alternating between a phase during which requests are sent and responses are received, 
followed by a period where at most one data packet is sent on C UiV . 

Lemma 5.6.3 Between two consecutive (u,v) phases on link (u,v) at most one 
SEND UiV (pdata) event can occur. 

Proof: The first SEND UiV (pd a ta) event after the i-th (u,v) phase will set phase u [v] = true 
which begins the i + 1-st phase. | 

The proofs of the next two lemmas are quite tedious and are relegated to the 
appendix. The first lemma states that a single phase completes in constant time. 
Intuitively, the time taken to complete a phase consists of the time to start a phase 
(which may involve the sending of a data packet) followed by the time to send a 
request and receive a matching response. The second lemma shows that data packets 
are sent out on a link in a bounded time after they are placed on the queue for the 
link. Intuitively, this is because a phase takes constant time to complete and if the 
data queue is non-empty, the code sends at least one data packet between consecutive 
phases. 

Before we state these lemmas, we define a quantity t p . Intuitively, t p is the the 
time it takes to complete a phase. 

Definition 5.6.4 We let t p = 6t n + 12t t . 

The major components of the time to complete a phase are sketched in Figure 5.10. 
Since the free variables may be incorrect in the initial state, the first packet sent on 
a link can be dropped. However, in constant time, the links stop dropping packets. 
Then, after the possible sending of data packet, a request is sent out by u in constant 
time. Next, after the possible sending of a data packet, a response is sent by v in 
constant time. The appendix contains more details. 
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Figure 5.10: The major components of the time required to complete a phase. 

Lemma 5.6.5 Phase Rate: For all a and any leader edge (u,v) and any integer x, 
at least x (u,v) phases will have completed within [x + 1) -t p time units after the start 
of a. 

Proof: From Lemma B.3.4 and Lemma B.3.5 in Section B.3 of the appendix. The 
reason why we need [x + 1) • t p time (instead of x • t p time) to complete x phases is as 
follows. Suppose in the initial state of a, phase u [v] = true; it may take t p time before 
phase u [v] becomes false. Thus, the first (u,v) phase in a may be a "partial phase". 
However, the definition of a phase does not allow us to consider a "partial phase" as 
a phase. 1 

Lemma 5.6.6 Data Packet Rate: For any a = s ,ai,... and any (u,v), if 
s .\queue u [v]\ > then a SEND UiV (pd a ta) occurs within t p time units after s . 

Proof: At the end of Section B.3 in appendix. | 



5.6.3 Clean Phases 

We know that a link can drop packets if a packet is sent when the link already has a 
packet. To be sure that packets will not be dropped, the sender must never think that 
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count [v] is the count at sender 



count [u] is the count at receiver 





reqcount(u,v) is the count carried in a request 



respcount(u,v) is the count carried in a response 



phase [v] is set to true 

u 

during a phase 

Figure 5.11: The key variables used in the definition of a clean link: counts ei(u, v) is the union of the 
count at the receiver and any count values in request and response packets. A clean link ensures that 
between phases, the sender count value is not in countset. 

a link is free when it isn't. In this case, we say the link is "drop-free". 

Definition 5.6.7 Let F u>v denote the predicate of M + defined by: (freeq u [v] = true) — > 
(Qu,v = nil). We also say that (u,v) is drop-free in state s of M + if s £ F UiV . 

In the appendix, we show that once a link is drop-free, it remains drop-free. In- 
tuitively, this is because the sender will not record the link as free unless it receives 
a free signal from the link, which means that there is no packet on the link. In the 
appendix, we also show a link (u,v) becomes drop-free in constant time - in fact, after 
the first packet sent on the link. This follows because after the first packet is sent 
on the link, the sender records the link as being busy, and this trivially satisfies the 
drop-free predicate. 

Before studying the structure of phases, we introduce some notation to denote the 
set of counter values in a (u,v) subsystem. These include counters in any request or 
response packets, and the counter stored at node v. These variables are sketched in 
Figure 5.11. 

Definition 5.6.8 For any state s we define the derived variables reqcount(u,v), 
respcount(u,v) and countset(u,v) as follows: 

1. IfQ u ,v = Preq then reqcount{u , v) = p req . count; otherwise reqcount(u,v) = undefined 
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2. If Q ViU = Presp then respcount(u,v) = p resp . count; otherwise respcount(u,v) = 
undefined. 

3. countset(u,v) is the set formed by the union of the values in respcount(u,v), 
reqcount(u,v) and count v [u] 

In order for a (u,v) phase to work correctly, we need the phase to follow the 
structure describe in Figure 5.4. For this to happen, when a numbered request is 
first sent during a phase, the number of the request should not already present at 
the receiver or in the channels. If this is not the case, it is possible for an incorrect 
response to be accepted in the phase. We formalize this notion of a link and a phase 
being "clean" using five conditions. The second condition is the crucial condition; 
the other four are supporting conditions required to ensure that the clean predicate is 
closed. 

Definition 5.6.9 We say that a leader edge (u,v) is clean in state s of M + iff all the 
following predicates are true in s: 

1. (v,u) and (u,v) are drop-free. 

2. If phase u [v] = false then count u [v] ^ countset(u,v) 

3. If phase u [v] = true then reqcount(u,v) = count u [v] or reqcount(u,v) = undefined 
4- If phase u [v] = true and respcount(u,v) = count u [v] then respcount(u,v) = count v [u] 
5. If count v [u] = count u [v] then Q u>v £ Pdata- 

The first condition ensures that the links in both directions are drop-free. The 
second condition ensures that when a numbered request is first sent during a phase, 
the number of the request is not already present at the receiver or in the channels. 
This is the important property of a clean link. 

The third condition states that during a phase, any requests must carry the sender's 
number. The fourth condition states that during a phase if there is a matching response 
on the channel from the receiver to the sender, then the receiver has the same number 
as the sender. The fifth condition says that if during a phase the sender and receiver 
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numbers are the same, then there can be no data packet in transit from the sender to 
the receiver. The fifth condition (as we will see below) follows from the fact that the 
sender will not send a data packet during a phase. 

Definition 5.6.10 A (u,v) phase p is clean if (u,v) is clean in the first state of p. 

Intuitively, a clean phase will contain an action to send a request packet at, u 
followed by the receipt of this packet at v followed by the sending of a response by 
u followed by the receipt of the response at u. The receipt of the matching response 
ends the phase. This is shown in Figure 5.4. Thus a clean phase will ensure that the 
response received at node u will correspond to the requesting information at node u. 

A nice property is that once an edge becomes clean, it remains clean. 

Lemma 5.6.11 For any transition (s,tt,s) and any leader edge (u,v), if(u,v) is clean 
in s then (u,v) is clean in s. 

Proof: See Section B.4 in the appendix. However, it is not hard to see informally see 
why this is true. First, as we have seen before, once a link is drop-free, it remains 
drop-free. Next, we know from the fourth condition that a matching response can 
only be received if the receiver has the same number as the sender. Also from the 
second condition, any requests present during a phase must have the same number as 
the sender. Now the sender always increments its number after receiving a matching 
response. Thus after a phase is over, the number of the sender must be different from 
any of the numbers present in the receiver or channels: this is the second condition. 
The third condition follows from the fact that a phase begins by sending a request 
with the sender's number, and subsequent retransmissions of requests carry the same 
number. Next, if a matching response is present, it must have been sent by the receiver 
during the phase. But, by the third condition, the receiver cannot change its number 
after it sent the response. Thus if there is a matching response, the number of the 
receiver must be the same as the sender, which is the fourth condition. 

The fifth condition follows because the sender never sends a data packet during 
a phase. By the second condition, at the start of the phase the receiver number is 
not equal to the sender number; thus the two numbers become the same only after 
the receiver has received a request. But the receipt of this request "flushes" out any 
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data packets that were in transit from sender to receiver at the start of the phase. 
This (taken together with the fact that the sender never sends a data packet during a 
phase) is what makes the fifth condition hold. | 

The following lemma explains why the request-response matching procedure is 
stabilizing. 

Lemma 5.6.12 A leader edge (u,v) is clean in all states after the fifth (u,v) phase 
in any execution a. 

Proof: (Idea) We first show that after at most two phases (v,u) and (u,v) are both 
drop-free. This follows (see Claim B.2.2 in the appendix) since some packet (i.e., at 
least one request and at least one response) must have been sent on either link by 
this time. Next consider the end of the second phase. In this state, there must be 
some c £ {0, ... ,3} such that c (£ counts et(u,v). This follows because, by definition, 
countset(u,v) has a maximum of three elements. Now consider the first time after the 
end of the second phase that count u [v] = c. This must occur at the end of a phase 
because count u [v] is only incremented at the end of a phase. Also this must occur at 
or before the end of the fifth phase because count u [v] increases by 1 mod 4 at the end 
of each phase. 

It is easy to see using Claim B.4.1 that when count u [v] first becomes equal to c, 
c (£ counts et(u,v). Thus at or before the end of the fifth phase, phase u [v] = false and 
count u [v] (£ counts et(u,v). Thus at or before the end of the fifth phase, edge (u,v) is 
clean. Finally, by Claim 5.6.11, (u,v) is clean in all subsequent states of a. | 

Now we have reached the goal of this subsection. First we define: 

Definition 5.6.13 Let C be the predicate of M + such that for all leader edges (u,v), 
(u,v) is clean in all states in C . 

Recall that M + \C is the automaton that is identical to A/" + except that all leader 
edges are clean in any initial state. 

Theorem 5.6.14 M + stabilizes to the executions of M + \C in time Qt p . 
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Proof: Follows directly from Lemma 5.6.11, Lemma 5.6.12 and Lemma 5.6.5 and the 
Execution Convergence theorem, Theorem 3.4.5. Lemma 5.6.11 shows that each leader 
edge becomes clean after at most 5 phases which by Lemma 5.6.5 takes at most Qt p 
time. Lemma 5.6.11 shows that once a leader edge is clean it remains clean. Then the 
Execution Convergence theorem (Theorem 3.4.5) shows that all leader edges become 
clean in at most Qt p time. | 

5.6.4 Quiet Links: Establishing Link Predicates 

Clean phases are only useful because they allow local predicates to be established as 
we show in this section. We know from the last section that every execution has a 
suffix that is clean. When we say that L u>v holds in a state s of A/" + , we mean that 
(s\u,s\(u,v),s\(v,u),s\v) G L u>v . 

Recall that < is the partial order on edges associated with reset function /. Con- 
sider a link (u,v) such that there is no other link {w,x} < {u,v}. We will show that 
if in the first state of a clean (u,v) phase mode u [v] = snapshot then at the end of the 
phase, a true snapshot of the (u,v) subsystem is obtained. In particular, it is possible 
at the end of such a phase to determine whether L u>v holds at the end of the phase. If 
it does not hold, mode u [v] is changed to reset. In a similar fashion, we can show that 
any clean phase whose initial state has mode u [v] = reset will guarantee that L u>v holds 
at the end of the phase. The net effect is that L u>v holds by end of the second (u,v) 
phase of a "clean" execution. 

However, we want to show not only that L u>v holds by end of the second (u,v) 
phase but also that no more "reset" actions can occur after this point so that L u>v will 
remain true. This motivates the following definitions of a quiet link. Please refer to 
Figure 5.12 for an intuitive explanation. 

Definition 5.6.15 We say that a leader edge (u,v) is quiet in state s of M + iff the 
following predicates hold in s: 

1. (u,v) is clean. 

2. (s\u,s\(u,v),s\(v,u),s\v) G L u>v 

3. (mode u [v] ^ reset) and (mode v [u] ^ reset). 
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Leader mode should Responder mode should 

not be reset not be reset 

Any Request that is not outdated must not be a reset request 
V > <>> 





Any Response will not cause leader to change mode to reset 
LEADER 

Figure 5.12: A leader edge is quiet if it clean, its local predicate holds, and it satisfies the conditions 
sketched in the figure. 

4- If Qu,v = Preq and p req .mode = reset then p req . count = count v [u\. 

5. If Q ViU = Presp and p resp . count = count u [v] then (s\u,nil,nil,p reS p.nodestate) £ 

Notice that the fourth condition above ensures that there is no reset request on the 
link that could be accepted by the receiver at a later state. The fifth condition ensures 
that any snapshot information in a matching response will not cause the sender to 
change its mode to reset. 

Our goal is to show that eventually all links are quiet. 

Definition 5.6.16 Let Q be the predicate set consisting of the following predicate for 
each leader edge in G. The predicate for each leader edge (u,v) is that (u,v) is quiet. 
Let Q be the predicate that is the intersection of all predicates in Q. 

We will extend the partial order < (which was defined on undirected pairs of 
nodes) to leader edges by assuming that that leader edge (u,v) < leader edge (w,x) 
iff {u,v} < {w,x}. Next, we will use the Execution Convergence theorem (Theorem 
3.4.5) to show that A/" + stabilizes to the executions of M + \Q. We know that A/" + 
stabilizes to the executions of Af + \C. Thus it is sufficient to show that Af + \C is 
stabilized to Q using predicate set Q and partial order <. To show this, we will first 
show two lemmas: 
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First we state the required stability condition: that a leader edge (u,v) will remain 
quiet as long as all leader edges less than or equal to (u,v) are quiet. 

Lemma 5.6.17 Consider a leader edge (u,v). If every leader edge (w,x) < (u,v) is 
quiet in some state s of M + \C, then for any transition (s,tt,s), (u,v) is quiet in state 
s. 

Proof: In Section B.5 in the appendix. However, the basic idea is simple. We have 
already seen that the clean predicate is stable. We also know from the definition of 
local checkability that L u>v is closed for all the actions of the original automaton M . 
However, in A/" + we added additional actions to send requests and responses. It is 
easy to verify that the sending of snapshot requests and responses do not affect L u>v . 
However, L u>v is affected by actions that send reset requests and responses on edges 
to neighbors that are less than (u,v) in the partial order. (If such "reset" actions 
occur on edges to neighbors that are not less than (u,v) in the partial order, then the 
definition of the local correction function ensures that L u>v remains true.) But such 
"reset actions" cannot occur on edges less than (u,v) in the partial order, because 
such edges are quiet by hypothesis. 

Next, if L u>v holds, and the snapshot information in matching responses is always 
correct, the sender will never change its mode to reset. But if the sender never changes 
its mode to reset, the sender will never send a reset request. And if the receiver never 
receives a reset request, the receiver will never change its mode to reset. | 

Next we state the required liveness condition: that a leader edge (u,v) will become 
quiet in bounded time if all leader edges less than (u,v) are already quiet. 

Lemma 5.6.18 Consider a leader edge (u,v) and some execution a ofAf + \C. If every 
leader edge (w,x) < (u,v) is quiet in some state S{ of a, then (u,v) is quiet in some 
state that occurs within 3 • t p of Si in a. 

Proof: In In Section B.5 in the appendix. However, the basic idea is simple. Consider 
the first complete (u,v) phase after S{. If it is a snapshot phase, and L u>v does not hold 
at the end of the phase, this will be detected and the next phase will become a reset 
phase. Thus either the first or second complete phase after S{ will be a reset phase. 
At the end of the reset phase, (u,v) will become quiet because all the links that (u,v) 
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depends on are already quiet. The upshot is that within two complete phases, (u,v) 
becomes quiet. But this can take up to 3t p time (where t p is the time to complete a 
single phase) because the first phase after S{ can be an incomplete phase. | 

The last two lemmas immediately give us: 

Lemma 5.6.19 M + \C is stabilized to Q using predicate set Q and time constant 3t p . 

Thus applying the Execution Convergence Theorem (3.4.5) and using the last 
lemma, we get: 

Lemma 5.6.20 M + \C stabilizes to the executions of M + \Q in time height(f) -3t p . 

For convenience we define a quantity t q which intuitively can be thought of as the 
time after which any execution of A/" + becomes quiet. 

Definition 5.6.21 We let t q = 6t p + height(f) ■ 3t p ; 

Thus we have the major result of this section: 

Lemma 5.6.22 M + stabilizes to the executions of M + \Q in time t q . 

Proof: Follows directly from Lemmas 5.6.14 and 5.6.19 and transitivity (Lemma 3.1.6). 
I 

5.6.5 Projecting Behaviors of J\f\Q 

In this subsection, we prove that if all leader edges are quiet in the initial state of 
some execution a of A/" + , then the external behavior corresponding to a is a behavior 
of M\L. Intuitively, this is not very hard to believe. This is because for every leader 
edge (u,v), L u>v holds in every projected state of a. We also know that there will be 
no reset transitions in a. Formally: 

Lemma 5.6.23 Every behavior of M + \Q is a behavior of Af(t p )\L 
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Proof: Follows by using a refinement mapping (Theorem 3.4.3) using the mapping 
function Proj (Definition 5.6.1). 

First, for any state s of Af + \Q, and any leader edge (u,v), we know (from the 
definition of Q) that (s\u,s\(u,v),s\(v,u),s\v) £ L u>v . Thus Proj(s) £ L. Thus for 
any state s of A/" + , Proj(s) is a start state of Af(t p )\L 

Next consider any transition, (s,tt,s) of M + \Q. We know from the definition 
of Q that there are no reset transitions in M + \Q. Suppose n is a SEND U] „(p req ) , 
RECEIVE U] „(p req ) , SEND„ ]U (p resp ) , or RECEIVE„ ]U (p resp ) for any (u,v). Then it is easy 
to see that Proj(s) = Proj(s). But such actions are not actions of M . 

If 7r is a SEND UiV (pdata) action for some edge (u,v), then it must be that pdata is at 
the the head of s.queue u [v] and s.free u [v] = true. Since Proj (s). queue u [v] = s.queue u [v] 
and Proj(s).free u [v] = s.free u [v], the SEND UiV (pd ata ) event is also enabled in Proj(s). 
Also, s.Q UiV = Proj(s).Q UiV = pdata, since (u,v) is drop-free in s. Thus (Proj(s), 
SEND UiV (pdata) , Proj(s)) is a transition of Af(t p )\L. Similarly if n is a ^,ECElVE UiV (pciata) 
action for some edge (u,v), then it must be that s.Q UiV = pdata and s.Q UiV = nil. 
Thus Proj(s).Q UtV = p data and Proj(s).Q UtV = nil. and (Proj(s), RECEIVE UtV (p data ) , 
Proj(s)) is a transition of Af(t p )\L. Next if n is a FREE U] „ action for some edge (u,v), 
then it must be that s.Q UiV = nil and s.Q UiV = nil. Thus Proj(s).Q UiV = nil and 
Proj(s).Q UiV = nil and (Proj (5), FREE U] „, Proj(s)) is a transition of Af(t p )\L. 

Finally suppose that n is any other action of Af(t p )\L. (For example, this category 
would include internal actions of Af(t p )\L.) Then such actions remain unmodified and 
the values of all node variables are identical in s and Proj (s). Thus (Proj (s), 7r, Proj (s)) 
is a transition of Af\L(t p ). 

Finally consider the timing properties. Suppose a SEND UiV (pd ata ) event is enabled 
in Proj(s) for any edge (u,v). Then 5.|gueue u [-y]| > 0. Thus by Lemma 5.6.6, a 
SEND UiV (pdata) event occurs within t p time after s. Suppose a R,ECElVE UiV (pciata) event 
is enabled in Proj(s) for any edge (u,v). Then s.Q UiV = pdata 7^ nil. From the timing 
properties of M + \Q we know that a R,ECElVE u ^ v (pd a ta) will occur in ti time, and ti < t p . 

Consider a FREE U] „ action for any edge (u,v). Within ti time units after 5, either 
a FREE U] „ action will occur or there is a state s' such that s'.Q UiV ^ nil. Consider 
the second case. From Claim B.3.1, within ti time units after 5', there is a state s" 
such that s".Q UiV = nil. Since (u,v) is drop-free in all states of a, s" .freeq u [v] = false. 
Thus Q u ,v = nil will remain true until a FREE U] „ action occurs within time ti after s" . 
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Thus a FREE U] „ action will occur within 3ti time units after state s. In particular, this 
means that a FREE U] „ action will occur within t p time units of any state s such that 
FREE U] „ is enabled in Proj(s). 

Finally suppose that n is any other locally controlled action of Af(t p )\L. Thus n 
is an internal action of Af(t p )\L. Notice that n is in the same class say c (with upper 
bound equal to t c ) in both M + \Q and Af(t p )\L. Suppose that n is enabled in Proj(s). 
Then n is enabled in s. Then in t c time of s in a either some action in class c occurs or 
there is some state s such that no action in class c is enabled in Proj(s). This follows 
because if no action in class c is enabled in s, then no action in class c is enabled in 
Proj(s), and vice versa. | 

5.6.6 Tying up the Proof 

We now return to the proof of the Local Correction Theorem, Theorem 5.4.3. It follows 
from Lemma 5.6.23 and Lemma 5.6.22 and Theorem 3.4.2. 



5.7 Implementing Local Checking in Real Networks 

In this chapter, we described a stabilizing snapshot and reset protocol for link sub- 
systems. This protocol was used to transform an automaton M that was locally cor- 
rectable to some predicate L into a UIOA A/" + that stabilizes to the behaviors of Af{c)\L 
for some constant c. To make the snapshot/reset protocol stabilizing we numbered 
snapshot requests and responses; we also relied on the fact that each link was a UDL. In 
practice, however, there is an even simpler way of making the snapshot/reset protocol 
stabilizing. This can be done using timers. 

Suppose there is a known bound on the length of time a packet can be stored 
in a link and a known bound on the length of time between the delivery of a request 
packet and the sending of a matching response packet. Then by controlling the interval 
between successive snapshot/reset phases it is easily possible to obtain a stabilizing 
snapshot protocol. The interval is chosen to be large enough such that all packets from 
the previous phase will have disappeared at the end of the interval. This solution was 
advocated by us in [APV91a] and was implemented in a trial implementation on the 
Autonet [MAM+90]. 
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To keep our model simple, however, we have assumed that all lower bounds (on 
the time between events) are zero. Thus we have no way to model timers, which are 
needed to control the interval between phases. While we will not describe the timer 
based scheme formally, the reader interested in practical applications should be aware 
of the simplicity of a timer based scheme. Notice also that the maximum value of the 
timer need only be some small multiple of the worst-case round trip delay on a single 
link. We call such timers local timers. 

In most real networks, each node sends "keep-alive" packets periodically on every 
link in order to detect failures of adjacent links. If no keep-alive packet arrives before 
a local timer expires, the link is assumed to have failed. Thus, it is common practice 
to assume time bounds for the delivery and processing of packets. Note also that the 
snapshot and reset packets used for local checking can be "piggy-backed" on these 
keep-alive packets without any appreciable loss in efficiency. 



5.8 Summary 

The two main contributions of this chapter are the definition of Unit Storage Data 
Links in Section 5.1.2, and the notion of local checking and correction (Section 5.3 and 
Local Correction theorem, Theorem 5.4.3). 

In a stabilizing setting it is necessary to define Data Links that have bounded 
storage. First, such models correspond to physical reality. Second, they avoid the 
theoretical problems with unbounded storage links. We have chosen unit storage links 
(UDLs) because they are practical (see Section 5.2) and they can be modelled elegantly. 
We have also defined a stabilizing interface to a UDL. This is done by having the link 
periodically deliver a free signal (to avoid deadlock) and by having the sender keep a 
variable that indicates whether the link is free. We hope the UDL model will be used 
by others. 

Intuitively, a protocol is locally checkable if whenever the protocol is in a bad state, 
some pair of neighbors can detect this fact. Intuitively, a protocol is locally correctable 
if the protocol can be corrected to a good global state by independently correcting the 
states of each link subsystem. A link subsystem is just a pair of neighboring nodes 
and the links between them. 
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Local checkability is not a very new or surprising idea. For example, many authors 
have proposed local methods for detecting termination and deadlocks. In a stabilizing 
setting, the intuitive notion of local checkability was first referred to in a paper by 
[AKY90]. However, their reference to this concept (during the description of a spanning 
tree protocol) was brief and intuitive. 

Our contribution has been to make precise the notion of local checkability, and 
to show that this is a useful and pervasive concept. Our definition (Definition 5.3.6) 
has some subtle aspects: for example, the requirement that each local predicate be 
closed may not be immediately obvious. Also, we have implemented local checking by 
doing a snapshot of each link subsystem. Now, a number of practical, self- stabilizing 
protocols (e.g., [Per85]) do what essentially amounts to local checking in the following 
way. Periodically each node sends its state to all its neighbors. However, (as we will 
show in Chapter 8) such periodic sending of state is not always sufficient to do local 
checking. Periodic sending is sufficient if all local predicates can be separated into 
what we call (see Chapter 8) one-way predicates. In the general case, a snapshot is 
required. 

The idea of local correction is more unusual. It is perhaps surprising that there 
are non-trivial protocols with this property. As we will see in Chapter 6, an easy way 
to ensure local correction is to first build a spanning tree of the network and then do 
local correction using the tree. Luckily, there is another class of protocols that are 
locally correctable: these are protocols that work in dynamic networks in which links 
can fail and recover. We will see an important example of such protocols in Chapter 
7. 

Local checking and correction is practical. In most real networks, each node sends 
"keep-alive" packets periodically on every link. The packets used for local checking 
and correction can be "piggy-backed" on these keep-alive packets. 
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Chapter 6 

Stabilizing Mutual Exclusion and 
Tree Correction 



The main result of Chapter 5 was the Local Correction theorem, Theorem 5.4.3. In this 
chapter we will consider a simple application of the Local Correction theorem to the 
problem of mutual exclusion. Section 6.1 gives an overview of our stabilizing mutual 
exclusion protocol. Section 6.2 formally defines our mutual exclusion protocol as a 
network automaton. Section 6.3 describes how local checking and correction is added 
to the automaton to create a stabilizing solution to the mutual exclusion solution. 

The last two sections of this chapter extract some general principles from the 
example of stabilizing mutual exclusion. In Section 6.4, we discuss a weaker notion of 
local checkability called weak local checkability. We show that in certain cases, a simple 
heuristic of removing unexpected packet transitions can be used to transform a protocol 
that is weakly locally checkable into a locally checkable protocol. This heuristic has 
proved to be quite useful, and is used in later chapters. 

Finally, in Section 6.5, we show (informally) an important result, the Tree Cor- 
rection theorem. This theorem states that any locally checkable protocol on a tree 
topology can be efficiently stabilized. In other words, if the underlying topology is a 
tree we can dispense with the need for the (stronger) local correctability condition. 
This theorem is the counterpart to a similar theorem proved in Chapter 4 for shared 
memory systems. 
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6.1 Overview of Token Passing Protocol 

Token passing is one way of ensuring mutual exclusion in a network. If there is only 
one token in a network, a node can go into the critical region when it receives a token. 
In the I/O automaton model this is typically modelled by adding an output action to 
every node by which the node can give permission to some external user to go into 
the critical section, and adding an input action that tells the node when the user has 
finished with the critical section. For simplicity we ignore this extra piece of modelling 
which is important for providing a modular interface to other subsystems. 

Token protocols that stabilize in time proportional to the height of the tree have 
been described before by [DolevIM90] in a shared memory setting. However, we believe 
our protocol is simpler and more transparent. While the mutual exclusion protocol is 
very simple, its simplicity makes it a good candidate to understand how the general 
method of local correction developed in Chapter 5 can be applied. 

Stabilizing spanning tree protocols have been well studied [AKY90],[AG90],[AV91]. 
Thus we can reduce the problem of token passing on an arbitrary connected network 
to that of token passing on a tree. Instead of modelling the interface to the network 
automaton that computes the tree, we will (for simplicity) assume that the network 
graph is a tree. Thus we are dealing with a network automaton of the form M = 
Net(G,N) where G is a tree graph. Formally: 

Definition 6.1.1 A tree graph T is a topology graph such that: 



• 



• 



T is a rooted tree, (i.e, there is a distinguished node called the root and there is 
a unique path between any node and the root.) 

For every edge (u,v) in G, the leader function l(u,v) = u if u is the parent of v 
in the tree, and l(u,v) = v if v is the parent of u in the tree. 

Definition 6.1.2 A tree automaton is a network automaton whose topology graph is 
a tree graph. 

For a tree graph we will refer to the leader function I as parent for obvious reasons. 

The simplest mutual exclusion algorithm is to pass a token along some tour (e.g., 
DFS) of the tree; a node can go "critical" when it has the token. But such a protocol 
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Token 




Figure 6.1: Adding a pointer to each node makes token passing on a tree locally checkable. 

is not locally checkable; if there are two tokens at two links, each link subsystem may 
not locally detect a problem. 

Once we see this, it is easy to add a small amount of state to make the token protocol 
locally checkable. We add a pointer, pointer u to each node i such that pointer u points 
to where the token is in the tree; also if the token is at node u, then pointer u = nil. 
This is shown in Figure 6.1. 



6.2 Specification of Token Passing Protocol 

Our stabilizing token passing protocol is based on this idea and is specified by M = 
Net(T,N) where: 

• T is a tree graph. 

• Each u £ V is a UIOA of the form described in in Figure 6.2. 

We assume that there is a function neighbor Set{u) that lists the set of neighboring 
nodes of node u. (Recall that we allowed node automata to depend on the set of 
neighboring nodes and the leader function.) We assume there is some fixed circular 
ordering on neighbor Set{u) and that there is a function Succ u which when given a 
neighbor v as its argument produces the next neighbor in the ordering. The only 
packet sent by this automaton is a token packet token £ Pdata- We use the following 
additional variables: 
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• pointer u £ neighbor Set{u) U nil (*pointer to token*) 

• last u £ neighbor Set{u) (*last neighbor u received token from*) 

• queue u [v] (*queue that either contains exactly one token packet or is empty.*) 

• free u [v] : boolean (*true if no packet in transit on link to v *). 

Note that we can use domain restriction to ensure that the outbound queue for a 
link either contains a token or is empty. This can be done by using a single value to 
encode the queue that is either token or nil. We use a queue to keep the definition of 
N u compatible with the definition of a node automaton in Chapter 5. To make the 
token passing automaton a node automaton, we also have to pass the token by first 
enqueueing the token in the outbound queue for a link. Then, a second separate action 
is required to send the token from the queue to the link. However, by making N u a 
node automaton, we can apply the Local Correction Theorem of Chapter 5. 

Notice the code we have given does not specify start states because each N u is a 
UIOA . 

Lemma 6.2.1 N = {N u ,u £ T} is a set of node automata for T with Pdata = {token}. 

. Proof: Simple checking of the definitions given earlier. | 

6.3 Adding Local Checking and Correction 

To add local checking and correction, we need to define a link predicate set £ for 
M = Net(T,N). And to define £ we need to define a link predicate, L u>v , for each 
edge (u,v) in T. Let havetoken(u,v) be the boolean condition that is true iff there 
is a token in either queue u [v], Q u ,v> queue v [u] , or Q ViU . (Informally, havetoken(u,v) is 
true iff there is a token stored on any of the links between u and v or on the outbound 
queues for such links.) 

Intuitively, L u>v should describe the legal states of the (u,v) subsystem when M is 
behaving properly and there is exactly one token in the system. Consider such a good 
state and suppose the pointer at u is pointing to v. Then either the token is in transit 
between u and v OR the pointer at v is pointing away from u. 
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Enqueue^,, 


(* output 


action to 


enqueue token in queue for neighbor v*) 


Preconditions: 








pointer u = nil; 






(* u has token?*) 


v = Succ^lastu); 






(* x? is the next neighbor after last ?*) 


Effect: 








pointer u := v 






(* point towards where token is sent*) 


Add token to queue u [v] 






(*store token in outbound queue*) 


SEND UjV (token) 




(*output action to send token to neighbor v*) 


Preconditions: 








free u [v] = true 






(* link to v free?*) 


token is head of queue u [v] 








Effect: 








free u [v] = false 






(* set to false until link says its free*) 


Remove token from head of queue u [v] 




(*empty outbound queue*) 


r R.EE U] „ 






(*link to v says it is free, input action*) 


Effect: 








free u [v] = true 








RECElVE VtU (token) 




(*input 


action, token received from neighbor v*) 


Effect: 








If pointer u = v then 






(* token received on expected link?*) 


pointer u := nil 






(* accept token*) 


lastu := v 






(* update last*) 


All actions are in a separate 


class with upper bound t n . 



Figure 6.2: Actions for a node u with respect to a neighbor v in the stabilizing token passing protocol. 
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Definition 6.3.1 Local Predicates for Mutual Exclusion: We define the local 
predicate L u>v of M to hold iff all the following conditions hold in the (u,v) subsystem: 

• Exactly one of (pointer u ^ v) or havetoken(u,v) or (pointer v ^ u) is true. 

• There can be at most one token packet in the combination of queue u [v], Q u ,v, 
queue v [u] and Q ViU . 

Let £ be the link predicate set that consists of L u>v for each edge (u,v) in T. Let L 
denote Conj(C). 

Then we can show that: 

Lemma 6.3.2 The network automaton M = Net(T, N) is locally checkable for L using 
link predicate set C 

Proof: By definition L = Conj(C). We also need to show that each L u>v is a closed 
predicate. Recall that we say that a state s of M satisfies L u>v iff(s\u,s\(u,v),s\(y,u),s\v) (E 
L u>v . Thus we need to show that for any transition (s,tt,s) of A/", if s satisfies L u>v , 
then so does s. So assume that s satisfies L u>v . 

Clearly we need only consider actions at nodes u and v since only such actions can 
affect variables of the (u,v) subsystem. Also we need only consider actions at node u, 
because the argument for actions at node v is symmetrical. 

Suppose 7r is a Enqueue u x event. Then in s, pointer u = nil. Thus we can infer that 
in 5, havetoken(u,v) = false and pointer v = u. If x = v, then in s, havetoken(u,v) = 
true and pointer u = v and pointer v = u. Also in s there is no token in Q u ,v> queue v [u] 
and Q ViU . On the other hand, if a; ^ v , then in s, havetoken(u, v) = false and pointer u ^ 
v and pointer v = u. In either case, s satisfies L u>v . 

Suppose 7r is a SEND U]a . (token) event. If a; ^ v, then this action does not affect any 
of the concerned variables. So suppose x = v. Then in s, token is the head of queue u [v]. 
Thus we can infer that in s, pointer u = x, havetoken(u,v) = true, and pointer v = u. 
Also in s there is no token in Q u ,v> queue v [u] and Q ViU . All this action does is to move 
the token from queue u [v] to Q u ,v Thus s satisfies L u>v . 
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Empty queue u [v] 




(*remove any token stored in queue*) 


If u is the parent of v then 






If pointer u = v then pointer u 


= nil 


(* take away token from child subtree*) 


Else 




(*v is parent of u*) 


pointer u = v 




(* point towards parent*) 



Figure 6.3: Code for f(s\u, v), the reset function applied at node u with respect to neighbor v 

Suppose 7r is a FREE U]a . event. Then this action does not change the concerned 
variables and hence s satisfies L u>v . 

Suppose 7r is a RECEIVE„ ]U (£o£en) event. Then in s, Q ViU = token. Thus we can 
infer that in s, pointer ] u = v, havetoken(u,v) = true and pointer v = u. Also in s 
there is no token in Q u ,v> queue v [u] and queue u [v]. Thus in s, the only change is that 
Qv,u = nil, havetoken(u,v) = false, and pointer u = nil. 

The most interesting case is if n is a RECEIVE^ ( token) event with x ^ v. We 
consider two cases. Suppose in s, pointer u ^ x. Then, since the code will not accept the 
token in this case, it is easy to see that in s, the values of pointer u ,pointer v ,Q UiV , Q ViU , 
queue v [u] and queue u [v] are identical to their values in s. Hence s satisfies L u>v . Suppose 
in s, pointer u = x. Then we can infer that in s, pointer v = u and havetoken(u,v) = 
false. Also in s, pointer u = nil and pointer v = u and havetoken(u,v) = false. Thus s 



satisfies L, 



I 



The next thing we have to do is to specify a local reset function / to correct each 
(u,v) subsystem. Our idea is very simple. Let us define a partial order on pairs of 
neighbors in T such that "any edge e is less than any edge below e in T". More 
precisely, we let {u,v} -< {y, w} iff u is the parent of v and v is the parent of w. Also, 
{u,v} 7^ {w,x} if the two pairs do not have a node in common. We let < be the 
transitive closure of -<. Thus, the partial order directly reflects the structure of T. 

To allow / to be a reset function using < we must ensure that applying / to the 
state of v with respect to child w does not affect the stability of L u>v . This can be 
achieved by the following reset function / described in Figure 6.3. 
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Lemma 6.3.3 The function f defined in Fig. 6.3 is a local reset function for network 
automaton M = Net(T,N) with respect to link predicate set £ and partial order <. 

Proof: Consider any edge (u,v) and suppose that u is the parent of v; the reverse 
case is symmetrical. We check the two conditions in Definition 5.3.7. 

• Correction: We need to show that (f(s\u,v),nil,nil,f(s\v,u)) (E L u>v . Consider 
f(s\u,v). From the code in Figure 6.3 we see that in this node state, pointer u ^ 
v and queue u [v] is empty. Consider f(s\v,u). From the code in Figure 6.3 
we see that in this node state, pointer v = u and queue v [u] is empty. Thus 
(f(s\u,v),nil,nil,f(s\v,u)) (E L u>v . 

• Stability: We only need to check the stability condition for links less than 
(u,v) in the partial order. Since v is the child of u it suffices to show the 
following: for any neighbor w of v, if (s\u, s\(u,v), s\(y,u), s\v) (E L u>v then 
(s\u, s\(u,v), s\(y,u), f(s\v, w)) £ L u>v . But if we change the state of node v 
from s\v to f(s\v,w), queue v [u] remains unchanged. Next, if pointer v ^ u in s\v, 
then pointer v ^ u in f(s\v,w). Similarly, if pointer v = u in s\v, then pointer v = u 
in f(s\v,w). Taken together, these facts imply the stability condition. 

I 

Let height(T) denote the height of tree T which in turn is the length of the longest 
path from the root to a leaf. Clearly height(f) = height(T). 

The next lemma that follows directly from the previous lemmas and the definitions: 

Lemma 6.3.4 The network automaton M = Net(T,N) is locally correctable to L 
using link predicate set £ and the reset function f defined in Fig. 6.3. 

The following theorem follows directly by applying the Local Correction theorem 
(Theorem 5.4.3). 

Theorem 6.3.5 Let M be the set of node automata defined in Figure 6.2, £ be the 
local predicate set defined in Definition 6.3.1, and f be the local reset function defined 
in Figure 6.3. Let M + = Augment(Af,£,f), Then M + stabilizes to the behaviors of 
M{tp)\L in time t q • height(T), where t q and t p are constants. 
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Let the symbol * denote any node in the tree. We define the "correct" set of 
behaviors of a token passing system using a set B. Informally, B is the set of behaviors 
consisting of phases in which a token is received at a node, then enqueued at the same 
node, and then passed to a neighbor of the node. Also, we require that a token will 
be received periodically at every node. 

Definition 6.3.6 We define the set B of correct behaviors of a token passing system 
as follows. B is the set containing any behavior j3 that only has actions of M and such 
that: 

• For any u, after any RECEIVE* ]U ( token) event in j3 the next event other than a 
FREE*^ event is an Enqueue u * event. 

• For any u,v, after any Enqueue u „(£o£en) event in j3 the next event other than 
a FREE*,* event is a SEND UiV (token) event. 

• For any u,v, after any SEND UiV (token) event in j3 the next event other than a 
FREE*,* event is a R,ECEIVE U] „(£o£en) event. 

• For any u, and any suffix 7 of j3, a R,ECEIVE* ]U (£o£en) will occur in c • n time 
after the start of 7, where c is some constant and n is the number of nodes in T. 

We first argue informally that the behaviors of Af{t p )\L are in B. 

To make a verbal argument, we introduce some intuitive terminology. Let us say 
that a token is in transit between nodes u and v if havetoken(u,v) = true. We say 
that a token is at node u if pointer u = nil. 

We first see that for any s £ L, there is at least one token in s. An intuitive 
explanation for this is as follows. We will define a search procedure to find a token 
in state s. Start at any node u in the tree. If pointer u = nil then there is a token 
at u. If pointer u = v then from the local predicates, either there is a token in transit 
between u and v or pointer v ^ u. If pointer v ^ u we continue the search procedure 
recursively at node v. Since we never backtrack, the search procedure cannot continue 
indefinitely without encountering some leaf node w such that pointer w ^ x, where x 
is the parent of w. But if w is a leaf node and pointer w ^ x then, pointer w = nil and 
hence the token is at w. 
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Thus we know that there is at least one token in state s. Suppose this token is at 
node u. Then by induction on the length of the path between u and any node w ^ u, 
it is easy to see that pointer w ^ nil. Similarly for any edge (w,x), by induction on the 
length of the path between u and (w,x) we can show that the token is not in transit 
between w and x. A similar argument shows that if in s the token is in transit between 
u and v, then the token is not at any node x, nor is it in transit on any other edge. 
(w,x). Hence there is exactly one token in state s. 

Once we know that there is exactly one token in any state s of Af{t p )\L, it is quite 
easy to prove that the behavior corresponding to any execution a of Af(t p )\L is in B. 

For example consider any execution a and any state S{ in a immediately following a 
R,ECEIVE* ]U (£o£en) event for some node u. Then it is easy to see that in s;, pointer u = 
nil. This predicate will continue to hold until a an ENQUEUE,^ event occurs. But if 
pointer u = nil in a state s, then (since there can be only one token in state s) the only 
actions enabled are Enqueue u * or a FREE*^ event. Similar arguments can be used to 
show the behavior of a satisfies the next two properties that characterize a behavior 
in B However, we also need to show the fourth property of a behavior: that for any 
u, and any suffix 7 of a, a R,ECEIVE* ]U (£o£en) will occur in c • n time after the start of 
7, where c is some constant. This can be shown by an inductive argument using the 
properties of the Succ function. 

Thus we can show: 

Theorem 6.3.7 Let Af + be the automaton defined in Theorem 6.3.5. Then Af + stabi- 
lizes to the behaviors in problem B in time t q -height(T), where t q and t p are constants. 

6.4 Removing Unexpected Packet Transitions 

Consider the following modification of the token passing protocol described in 
Figure 6.2. The modification is shown in Figure 6.4. The only difference is that the 
routine to receive a token at a node u from node v has been changed. The only change 
is that we no longer check whether pointer u = v before accepting the token. Let us 
call the resulting tree automaton A/"*. 

Assume, however, that we continue to use the definitions of £, L, and L u>v from 
Definition 6.3.1. Then it is quite easy to prove that A/"* is not locally checkable with 
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ModifiedReceive„ i 


u (token) 








(*token is 


received from neighbor v*) 


Effect: 














pointer u := nil 












(* accept token*) 


lastu := v 












(* update last*) 


All external actions are in a 


separate 


class with 


upp 


er bound t n 





Figure 6.4: Modified Code for a node u in a token passing protocol. The remaining code is identical to 
the code in Figure 6.2 



respect to L and C This follows from the fact that L UiV is not a closed predicate of 

M*. 

Despite this it is not hard to prove that the behaviors of M*\L are exactly the 
behaviors of M\L. In fact, if we are allowed the luxury of specifying initial states, 
M*\L is a "natural" 10 A to solve the token passing problem. Suppose a protocol 
designer has started with M*\L and now wishes to construct a UIOA that stabilizes 
to the behaviors specified by the token passing problem. It is interesting to note that 
this can be done by the following two step process: 

• Transform A/"* into A/" by adding the extra check shown in Figure 6.2. 

• Transform A/" into A/" + as shown earlier. 

We would like to abstract this process. First, note that the extra check added in 
going from A/"* into A/" essentially amounts to the following heuristic: if we receive 
an "unexpected" packet p at node u from node v, then we do not process p. Notice 
that in A/", a token received from v when pointer u ^ v (see Figure 6.5) is certainly 
"unexpected". We will formalize this intuitive notion of an "unexpected" packet below. 
However, for the present it is important to intuitively understand why such checks for 
unexpected packets are useful. Consider a transition (s,7r, s) of A/" such that s satisfies 
L UiV but not L UiW . Then it is quite possible that there is an "unexpected" packet on 
channel Q WiU in state s. By adding checks for such packets in A/", we can ensure that 
s satisfies L UiV . Also note that these checks do not affect the "correct" executions of 
M\L. 
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Token receipt 

^ ^v u ~r~ 

pointer 

Figure 6.5: Receiving a token on a link that is not being pointed to is an unexpected packet transition. In 
general, an unexpected packet transition is a packet reception that could never have occurred if the receiving 
link subsystem was in a good state. 

We now formalize these observations. First, we define the notion of weak local 
checkability. Intuitively, we remove the requirement that that each L u>v be a closed 
predicate and instead require only that L is a closed predicate. 

Definition 6.4.1 A network automaton M is weakly checkable for predicate L using 
link predicate set £ if: 

• £ is a link predicate set for M and L ~D Conj(£). 

• For any transition (s,tt,s) of M , if s £ L then s £ L 

Ideally, we would like to prove that any weakly checkable protocol can be trans- 
formed into an "equivalent" locally checkable protocol. While we do not know how 
to do this in general, we can can obtain such a result if the automaton is also locally 
extensible. Intuitively, an automaton is locally extensible with respect to a link predi- 
cate set if any pair of "good" adjacent link subsystem states can be extended to form 
a "good" state of the entire automaton. 

Definition 6.4.2 A network automaton M = Net(G,N) is locally extensible with 
respect to link predicate set £ = {L u>v } if the following condition is true: 

For any two adjacent edges (u,v) and (v,w) in G, if x £ L u>v and y £ L v>w then 
there is some state s £ Conj(£) such that x = (s\u, s\(u,v), s\(v,u), s\v) and y = 
(s\v,s\(y,w),s\(w,v),s\w). 

To transform a locally extensible and weakly checkable protocol M* into an "equiv- 
alent" locally checkable protocol, we will add checks to M* for "unexpected" packets. 
We formalize this notion of an "unexpected packet" (see Figure 6.5) as follows: 
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Definition 6.4.3 Consider a network automaton M with link predicate set £ = {L u>v } 
and some pair of neighbors u,v in M . We say that a transition (s,7r,.s) is an unexpected 
packet transition at u with respect to v if: 

■k is a RECEIVE„ ]U (p) event and there is no a,b such that (s\u, a,p, b) £ L u>v . 

For example, in Figure 6.2, a (s, R,ECEIVE„ ]U (£o£en), s) transition with s.pointer u ^ 
v is an unexpected packet transition at u with respect to v. We can now state a simple 
theorem. 

Theorem 6.4.4 Consider a network automaton M* = Net(G,N) that is weakly check- 
able for predicate L with respect to predicate set C Suppose further that M is locally 
extensible with respect to C Then it is possible to construct another automaton M 
such that: 

• M is an automaton for graph G and the executions of M\L are identical to the 
executions of M*\L 

• M is locally checkable for predicate L with respect to predicate set C 

Proof: We transform M* into M by replacing all unexpected packet transitions (5, 7r, s) 
in M* by the null transition (s,7r, s). Then we use the local extensibility property to 
show each L u>v is a closed local predicate of M . I 



6.5 Tree Correction Theorem 

The reader who has read Chapter 4 might suspect that any locally checkable protocol 
on trees can be made locally correctable. Thus for tree automata it seems plausible that 
we do not need the stronger hypothesis that the tree automaton be locally correctable. 
This is indeed true. Compare the following theorem to Theorem 4.3.1 (but be aware 
that Theorem 4.3.1 applies to shared memory tree automata) and the Local Correction 
theorem (Theorem 5.4.3). 

Theorem 6.5.1 Tree Correction: Consider any tree automaton T = Net(T,N) 
that is locally checkable for L using link predicate set C Then there exists some Af + 
that is a UIOA for graph G and constants c and c such that M + stabilizes to the 
behaviors of Af{c)\L in time c- height(T). 
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The proof of this theorem is extremely similar to that of Theorem 4.3.1 of Chapter 

4. However, since we can no longer do snapshots and resets atomically, we need to 
use the local snapshot and reset protocols defined in Chapter 5. Thus we have to 
add actions for local checking and correction as in the proof of the Local Correction 
Theorem, Theorem 5.4.3. However, there are two differences. We will assume that 
for every (u,v) subsystem in which v is the parent of u, the child u performs the 
checking/correction. The local snapshot protocol is identical to the protocol of Chapter 

5, but the local reset protocol is a little different. This is sketched in Figure 6.6. The 
figure should be compared with the right hand diagram in Figure 5.9. 

The basic idea is that the reset response carries the state b of the parent at the 
instant the response was sent. When the child gets the response, the child sets its 
state to /(&), where / is a function that we describe next. Basically, / is chosen such 
that for every state b of the parent node v, (f(b),nil,nil,b) (E L u>v . In other words, / 
is chosen so that we can reset the link subsystem to a good state by only changing 
the state of the child node. Of course, that is the basic idea in the proof of Theorem 
4.3.1. The only tricky part is to argue that we can find such a function /. The proof 
is again similar to the proof of Theorem 4.3.1: we first normalize the original protocol 
T to get rid of "useless" node states that can never occur in global states in which 
all local predicates are true. We then show that the required function / exists for the 
normalized protocol. 

The Tree Correction theorem is not a corollary of the Local Correction theorem. 
Recall in Chapter 5 that when we applied a local reset function to the state of leader 
node u with respect to node v, the resulting state of node u can only depend on the 
previous state of node u. However, in the proof of the Tree Correction theorem we 
require the the resulting state of node u to depend on the previous state of node v\ 

Finally, we note that we could have derived the stabilizing mutual exclusion pro- 
tocol by showing that it was locally checkable and then using the Tree Correction 
theorem directly. 

6.6 Summary 

In Chapter 4, we showed (in a shared memory model) that any locally checkable 
protocol on a tree could be stabilized in time propostional to the height of the tree. 
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Correct Reset Phase 

Figure 6.6: Sketch of a reset phase used for Tree Correction. Node v is the parent of node u in the tree. 

This chapter shows that this theorem can be extended to the message passing model 
(Tree Correction theorem, Theorem 6.5.1). We also described a simple application - 
the problem of token passing on a tree. A stabilizing solution to this problem can be 
derived using either the Local Correction or Tree Correction theorems. 

The token passing problem suggests a simple strategy for stabilizing protocols. 
First, we try to add a small amount of state to make the protocol locally checkable. 
Recall that we added a pointer to each node for this purpose in the token passing 
protocol. Then we combine the original protocol with another protocol that computes a 
spanning tree. Finally, we do local correction on the resulting spanning tree. Although 
we have not done so, it is important to formally describe how an arbitrary protocol P 
could be composed with a spanning tree protocol so that the net effect is the same as 
if P were running on the final tree. 

Local checkability requires that each local predicate be a closed predicate. Re- 
moving unexpected packet transitions (Figure 6.5) is a useful heuristic that is often 
sufficient to ensure that each local predicate is closed. 

The token passing protocol on a tree can be generalized to passing a constant 
number of tokens on a tree. In this case, we replace the pointer variable pointer u at 
each node u by a variable token_count u [v] (one for each neighbor v) that keeps track 
of the number of tokens that are in the direction of neighbor v. The local predicates 
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have also to be suitably modified. 
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Chapter 7 

Stabilizing Network Reset 



The stabilizing network reset protocol described in this chapter is the link between the 
previous two chapters (which were about Local Correction) and the last two chapters 
of the thesis (which are about Global Correction). 

The major service provided by a network reset protocol is synchronization of the 
nodes in a network. In the first section of this chapter, we informally introduce the 
concept of synchronization, and discuss why this service is useful. In Section 7.2 we 
review existing reset protocols. Then in Section 7.3 we specify the reset problem. 
Previous specifications of reset protocols have been state-based. Our specification 
of the reset problem is novel in that it is based on external behaviors. In the next 
section (Section 7.4), we give an overview of our reset protocol. Then in Section 7.5, 
we formally specify our reset protocol using a reset automaton. 

Sections 7.6 and 7.7 are devoted to showing that the reset automaton is a stabilizing 
solution to the reset problem. We do this using the Local Correction theorem developed 
in Chapter 5. We show that the reset automaton is locally checkable by exhibiting a 
set of closed local predicates for the reset automaton. Then we show that the reset 
automaton is locally correctable by showing that the local predicates are independent 
- i.e., the partial order that formalizes the dependency relation between the predicates 
is the trivial partial order. Thus the reset protocol stabilizes in constant time. Next, 
(in Section 7.7), we show that once the reset automaton is in a state that satisfies 
all local predicates, then the behaviors exhibited by the reset automaton are indeed 
the behaviors specified by the reset problem. This completes the proof that the reset 
automaton is a stabilizing solution to the reset problem. 
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The last two sections of the chapter try to abstract some general principles based 
on the example of the stabilizing reset protocol. The reset protocol in this chapter is 
based on an existing non-stabilizing reset protocol [AAG87] that works in networks 
where links can fail and recover. Section 7.8 suggests that this is no accident - locally 
checkable protocols that work in networks where links can fail and recover are likely 
to also be locally correctable. 

Because this chapter is very long, we offer two suggestions for reading. First, 
constantly consult the roadmaps at the beginning of the chapter and each long section. 
Second, it is hard to appreciate the specification of the reset problem until one sees why 
it is useful. Chapters 8 and 9 describe important applications of the reset protocol. 
Readers may prefer to read Chapter 8 after reading the specification in Section 7.3 
and before reading the rest of Chapter 7. 



7.1 Synchronization 

A reset protocol is used to synchronize all the nodes in a network. Before we describe 
what we mean by synchronizing all the nodes in a network, it is helpful to understand a 
form of synchronization between a pair of nodes in a network. Such synchronization is 
provided by a Data Link protocol [Spi88a]. We will then see that in essence a network 
reset protocol generalizes the guarantees of a Data Link protocol to multiple nodes in 
a network. 



7.1.1 Data Link Synchronization 

Consider a pair of neighboring nodes in a network, say u and v. Suppose the physical 
link between two nodes in a network can periodically crash and recover. The Data 
Link protocol is responsible for providing notification (as to whether the link is up or 
down) to the users at u and v. Thus at each node, the Data link protocol issues Link 
up and Link down actions to signify that the link is operational or not operational 
respectively. If the network is asynchronous, it is impossible to provide a Link down 
event (or Link up event) at exactly the same instant at both u and v. But if Link up 
and down events are reported independently (and possibly at different times) at both 
ends, the Data Link must provide some additional functions to synchronize u and v. 
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The synchronization requirements for a Data Link protocol can be stated elegantly 
[Spi88a] as follows. First, for node u (or node v) we can define an operating interval at 
node u (or at node v) to be the time from a Link up event at that node until the next 
Link down event at that node. If an operating interval does not end with a Link down 
event we say that the interval is a final interval. Thus each execution of the Data Link 
protocol induces a set of operating intervals at both nodes. Then for synchronization, 
we require that there is a symmetric relation between intervals at the two nodes (called 
a mating relation in [AE86]) such that: 

• An operating interval can be mated to at most one other operating interval. 

• Suppose an operating interval a at u is mated to an operating interval j3 at 
v. Then the sequence of messages received by v in j3 must be a prefix of the 
sequence of messages sent by u in a. (This is often called the prefix property of 
Data Link protocols.) Also, if a is a final interval, then so is j3 and in this case 
the sequence of messages received by v in j3 must be identical to the sequence of 
messages sent by m in a. 

• Suppose an operating interval a at u is mated to an operating interval j3 at v 
and an operating interval a' at u is mated to an operating interval j3' at v. Then 
if a' occurs later than a then j3' occurs later than j3. 

The mating relation for two nodes u and v is sketched using the time-space diagram 
shown in Figure 7.1. We have depicted Link up events by horizontal lines. The Link 
down events between Link down events are depicted by crosses. An arrow from v to u 
depicts a packet sent by v that is successfully delivered at u, and vice versa. An arrow 
from v that does not reach u is a packet sent by v that is not delivered at u. In the 
figure, the second operating interval at u mates to the second operating interval at v. 
Also, the third operating interval at u mates to the fourth operating interval at v and 
the two intervals are final intervals. Notice that the sequence of packets received by 
u in its second interval is a prefix of the sequence of packets sent by v in its second 
interval. Also notice that all packets sent in the two final intervals are delivered. 

Why does this mating relation provide a useful form of synchronization? It does so 
because the synchronization relation guarantees that the behavior of a node during an 
operating interval is what might have occurred in some asynchronous execution of the 
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Figure 7.1: Illustrating the Mating Relation 

Data Link protocol in which there were no link failures. This is a crucial abstraction 
that allows users of the Data Link protocol to deal with failures in a simple way. 

Almost identical forms of synchronization are provided by virtual circuit protocols 
([Spi88a]) and transport protocols. Thus all these protocols essentially synchronize 
two nodes in a network. 

7.1.2 Network Synchronization 

The synchronization provided by a network reset protocol is a generalization of the 
synchronization guarantees of a Data link. A reset protocol synchronizes all the nodes 
in the network. Informally, the reset problem [Fin79] is to design a reset service that 
can be superimposed on any other distributed protocol P. The reset may be invoked 
at any node, and its effect is to output a signal at all the nodes of the system in 
a consistent way. We use S-messages to denote the messages sent and received by 
protocol P. 

By consistent, we mean the following. As in the Data Link protocol, the reset 
protocol induces signal intervals at each node (i.e., intervals between consecutive signal 
events at a node). Then for each pair of neighbors u and v we require that the signal 
intervals at u and v can be mated as described earlier. For example, if we replaced the 
Link up events with Signal events in Figure 7.1 and ignored Link down events, then 
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Figure 7.1 could equally well describe the pairwise mating relation provided by a reset 
protocol. Notice, however, that in a Data Link the Link up events occur independently 
on each link and so the the operating intervals on each link adjacent to a node u can 
be different. However, in a Reset protocol, the Signal events induce signal intervals on 
a per node basis. 

For example this means that the sequence of S-messages sent by any node u to 
any neighbor v after the last signal at u must be equal to the sequence of S-messages 
received by v after its last signal. 

Why is this called a reset service? Suppose S u defines the set of "legal" start states 
for every node u. Suppose the "legal executions" of P are those executions in which 
the initial state of every node u belongs to S u and the initial state of every channel 
is a state in which the channel contains no S-messages. It's natural to define the 
legal states of P as those states that occur in legal executions of P. Suppose we now 
superimpose a reset service over P. Suppose also that whenever a node u receives a 
signal, we locally reset the state of node u (i.e., the local state of protocol P at node 
u) to some state in S u . Then after every node has received its last signal, protocol P 
is in a legal state. In other words, the signals provide a consistent time point for every 
node u such that we can globally reset protocol P to a legal state by locally resetting 
each node u at its time point. The time point for each node is the instant it receives 
a signal. 

Thus a global reset service is much like the reset button on a computer. After the 
reset button is pushed, the computer is restored to a "good" state. However, globally 
resetting a network to a "good" state is much more challenging than resetting a single 
node. In a network, reset requests can arrive at any node and the signals must be 
delivered at every node in a consistent fashion. Ideally, we would like the signals to be 
delivered to every node at the same instant. However, since this seems to be impossible 
in a distributed system, we settle for delivering the signal in a consistent manner (see 
above). 

Why is a reset service useful? It was introduced in [Gal76, Fin79] as a tool for 
converting any protocol that works in a so-called static network to work in a so-called 
dynamic network. A static network, as the name suggests, is a network in which 
the topology of the network remains fixed during the execution of the algorithm. A 
dynamic network, by contrast, is a network in which nodes and links can crash, thereby 
changing the topology. However, it is assumed that topology changes eventually stop 
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and that some node in the final topology can detect the fact that there has been 
a topology change. Roughly, the idea behind [Fin79] is that any node that detects 
a topology change makes a reset request. If successful, the reset request results in 
restarting the static protocol. This methodology is quite practical. For instance, the 
Autonet local area network uses a version of [Fin79] to cope with failures. 

Besides its use in dynamic networks, a reset protocol is also useful in a stabilizing 
setting. As we show in Chapters 8 and 9, a stabilizing reset protocol is an important tool 
for the design of other stabilizing protocols. Notice that in order to use the reset service 
in dynamic networks [Fin79], some node must detect the last topological change. More 
generally, suppose that any bad state of a network protocol can be detected locally by 
some node. This corresponds to what we have called a locally checkable protocol in 
Chapter 5. As we will see in Chapter 8, any locally checkable protocol can be stabilized 
using a stabilizing reset protocol. Intuitively, our idea is similar to that of [Fin79]. We 
add actions to each node to make a reset request if the node locally detects a bad state 
of the network. 

The technique of using a reset protocol to stabilize a locally checkable protocol 
is quite different from the techniques developed in Chapter 5 and 6. In Chapter 5, 
a locally checkable and correctable protocol is stabilized by doing independent local 
resets of each link subsystem. In Chapter 6, a locally checkable protocol on a tree is 
also stabilized by doing independent local resets of each link subsystem. 

By contrast, we can stabilize a locally checkable (but perhaps not locally cor- 
rectable) protocol by doing a coordinated global reset of the entire network. As one 
might guess, there is a performance penalty in using a network reset. The stabilization 
time of a solution that uses local correction is proportional to the height of the under- 
lying partial order (see Local Correction Theorem, Theorem 5.4.3). The stabilization 
time of a solution that uses tree correction is proportional to the height of the tree 
(see Tree Correction Theorem, Theorem 6.5.1). 

However, the stabilization time of a solution that uses global correction is propor- 
tional to the number of nodes in the network. Because the height of the partial order 
(or the height of a tree) typically is smaller than the number of nodes, we have the 
following rule of thumb. If a protocol is locally correctable or runs on a tree, then it 
pays to use the techniques of Chapter 5 or 6. However, if a locally checkable protocol 
cannot be shown to be locally correctable, then the network reset approach provides 
a (perhaps less efficient) stabilizing solution. It is perhaps elegant that the network 
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reset protocol is itself stabilized using the local reset approach of Chapter 5. 

In this chapter we introduce the most efficient known stabilizing network reset 
protocol. We do so by applying the method of local checking and correction of Chapter 
5 to an existing reset protocol described in [AAG87] The space overhead of the protocol 
is logarithmic. Our reset protocol stabilizes in constant time. 



7.2 Existing Solutions 

In Chapter 4, we described a stabilizing reset protocol due to Arora and Gouda [AG90]. 
Their protocol was described in terms of a shared memory model but it appears that 
it can be adapted to work in our message passing model. [AG90] also describes a 
stabilizing protocol to build a spanning tree of the network. For the spanning tree 
protocol, it is assumed that processes have unique identifiers, and that there is some 
a priori bound K on the number of nodes in the network. The IDs and K cannot be 
corrupted by transient errors. The stabilization time of the spanning tree protocol is 
0(K ), where if is a non- volatile bound on the number of nodes in the network. Their 
protocol will also work correctly if K is an upper bound on the diameter of the final 
network. However, in a network in which the topology can change arbitrarily, often 
the only reasonable bound on the diameter of the network is a bound on the number 
of nodes in the network. Secondly, as we will see in Chapter 8, their spanning tree 
protocol is based on a routing algorithm that works poorly in practice. 

Katz and Perry [KP90] describe a stabilizing reset protocol. Their approach re- 
quires the election of a leader and the stabilization time of their approach is 0(n 2 ) 
where n is the number of nodes in the network. 

On the other hand, the reset protocol we describe does not require node IDs, and 
takes 0{n) time to stabilize, where n is the actual number of nodes in the network. 
Our protocol does not require the computation of a spanning tree. Thus it can be 
used to build a stabilizing spanning tree protocol as we show in the Chapter 8. 
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7.3 Specifying the Desired Behaviors of a Reset 
Protocol 

This section is divided into four subsections. First, we describe the external interface 
to the reset protocol. Then, we give an overview of the specification and some of 
its difficulties. After this motivation, we formally specify the reset problem and then 
briefly discuss alternative specifications. 

7.3.1 Interface to Reset Protocol 

We first describe the external interface to the reset protocol. 

A reset service is modelled as a network automaton 7Z = Net(G,N). The external 
interface for any node u in the network is shown in Figure 7.2. We have the usual 
interfaces to send and receive packets between neighbors as described in Chapter 5. 
However, in addition each node also has interfaces to send and receive messages on 
behalf of external users of the reset service. We assume that every message m is drawn 
from some message alphabet S, and that S C Pdata, where Pdata is the data packet 
alphabet used by the network automaton. Intuitively, the messages sent by users to 
the reset service will be relayed between nodes of the network using packets. 

Thus node u also has an input action SENDM U] „(ra) by which an external user can 
send a message to neighbor v. Similarly node u has an output action R,ECEIVEM„ ]U (ra) 
by which the reset service can deliver a message m from the user at node v to the user 
at node u. There is also a FREEM U] „ output action that is used by the reset service to 
indicate that it is ready to accept another message from node u to node v. Thus the 
external interface between a reset service and its users essentially mimics the interface 
offered by a UDL (see Chapter 5) except that packets are replaced by messages. This 
is an important property that will be exploited later. 

However, the interface between a user and a reset service is richer than the interface 
between a user and a set of UDLs. This is because the reset service at node u offers 
two additional actions: an input action Request u and an output action SlGNAL u . 
Intuitively, the Request u action is used by the user at node u to request a network 
reset, and the SlGNAL u action is used to inform the user at node u that a reset has 
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Figure 7.2: Interface specification for reset service 
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been completed. We will refer to any event of the form SlGNAL u as a signal event and 
any event of the form Request u as a request event. 

7.3.2 Difficulties in Specifying the Reset Problem 

The ideal behaviors of a non-stabilizing reset protocol can be specified in terms of 
three properties: timeliness, consistency, and causality. Intuitively, a behavior is 
timely if, in the absence of reset requests, free events are delivered in constant time 
and sent messages are delivered in constant time. A behavior is consistent if there is a 
symmetrical mating relation between signal intervals at neighbors. Finally, a behavior 
is causal if reset signals are only caused by reset requests and reset requests result in 
reset signals. 

As in a UDL, we cannot guarantee that any message sent by a user from say u to 
v will be received at v unless (see Figure 7.2) this message is sent after u performs a 
FREEM U] „ action. If this is true (and no other message is sent from u to v between 
the free action and the send action), then we will say that the send action is safe. We 
will only require delivery of messages corresponding to safe message send events. The 
specification will allow other messages to be dropped. 

Any specification of the behaviors of a stabilizing reset protocol has to take into 
account three anomalies that do not occur in the specification of a non-stabilizing reset 
protocol: 

• The first message sent on any link may not be safe in that it may not be preceded 
by a free event. 

• Some of the initial messages that are delivered may be abnormal in that they 
do not correspond to any messages sent. It appears that no stabilizing reset 
protocol can guarantee that there is some suffix of every behavior that contains 
no abnormal message delivery. 1 

• Some of the initial signal actions may not be "caused" by any reset request. 



*A stabilizing reset protocol can begin in a state in which all links can have arbitrary messages 
stored. There can be executions that contain no state in which all links are empty of messages. Any 
suffix of such an execution will have abnormal message deliveries. 
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We handle these difficulties as follows. We add the following condition to the 
timeliness property: within linear time of the start of any behavior of the reset protocol, 
all received messages are normal - i.e., correspond to some message sent. We weaken 
the mating relation so that it is possible to receive abnormal messages in a signal 
interval - however, normal messages received in a signal interval must have been sent 
in a mated interval. Finally, we relax the consistency property and ask only that 
causality holds in linear time after the start of a behavior. Precise definitions are 
given in the next subsection. 

7.3.3 Formal Specification of Reset Problem 

Recall the following notation. When we say that an event aj occurs within time t 
in j3 we mean that dj.time < f3. start + t. When we say that a time t is a constant, 
we mean that t = cti + c't n where c, c' are some scalar constants and t n and ti are 
the default node and link delays respectively. In this and following chapters, we will 
also use the following notation borrowed from complexity theory. When we say that 
t = 0(n), we mean that t < en, where c is some constant time and n is the number 
of nodes in the graph G. This reflects a linear dependence on the size of the input, 
if we consider the input to be the network graph. If we say that an event a; occurs 
within 0{n) time in a behavior j3, we mean that a{.time — j3. start = 0[n). If we say 
that an event aj occurs within 0{n) time after an earlier event a; in behavior j3, we 
mean that a j. time — a{.time = 0(n)] in this case, we will also say that event a; occurs 
within 0{n) time before event aj in behavior j3. Sometimes we will say linear time to 
mean an interval of time that is 0{n) in duration. 

As in a UDL, we cannot guarantee that any message sent by a user from say u to 
v will be received at v unless (see Figure 7.2) this message is sent after u performs a 
FREEM U] „ action. If this is true (and no other message is sent from u to v between 
the free action and the send action), then we will say that the send action is safe. We 
formalize this restriction by defining what it means for a message send event to be 
safe. 

Definition 7.3.1 Consider any behavior j3 of a reset protocol that has the interface 
shown in Figure 7.2. We say that an action aj = SENDM U] „(ra) is safe in j3 if: 

• There is some FREEM U] „ action in j3 before aj and 
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• No other action of the form SENDM U] „(*) occurs between aj and the FREEM U] „ 
action. 

Clearly we would eventually like every message that is received at u from say v to 
correspond to some message sent from v to u. We will require that there is at most 
one message in transit from v to u. This will make it easy to make a correspondence 
between received and sent packets. Thus: 

Definition 7.3.2 We say that event a*. = R,ECEIVEM„ ]U (ra) in j3 is a normal receive 
at u from v iff: 

• There is some aj = SENDM„ ]U (ra) in j3 that occurs before a*, in j3 and 

• There is no R,ECEIVEM„ ]U (*) event between aj and a*, in f3. 

We will refer to the earliest SENDM„ ]U (ra) in j3 that occurs before a*, (and satisfies the 
above properties) as the send corresponding to a*.. 

Notice that if the sender ignores the free notification and sends a message m several 
times before it is received, we make a correspondence between the receive and the 
earliest send event. 

Specifying Timeliness 

Next, we formalize the timeliness property. We we say that a behavior is timely if it 
satisfies four conditions for every pair of neighbors u and v. First all receive events 
that occur after linear time in j3 are normal. In other words, after at most 0{n) time, 
each packet received corresponds to some packet sent. Also any normal receive event 
occurs within 0{n) time after the corresponding send event. This is shown in Figure 
7.3. 

The second condition is that either the user at u periodically receives a free action 
indicating that the link to v is free or else a signal event occurs periodically at either 
u or v. Third, any message sent by v to u is either delivered in constant time or else 
a signal event occurs (at either u or v) after the message is sent. The essence of the 
second and third conditions is that, in the absence of signal events, free events are 
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Figure 7.3: Normal message receipt after linear time: There is a send action at v corresponding to any 
message received at u after 0(n) time. Also any normal receive event occurs within 0(n) time after it is 
sent. 

delivered periodically and sent messages are delivered in constant time. Intuitively, 
signal events are caused by reset requests; if reset requests are made continuously, then 
the reset protocol cannot guarantee periodic free events or the delivery of messages. 

The fourth timeliness condition is important (for example in Chapter 8) in appli- 
cations. It says that signals cannot keep occurring at u without also occurring at a 
neighbor v. More precisely, if a signal event occurs at u then a signal event must occur 
at v in linear time before or after the signal event at u. However, because the reset 
protocol can start in an arbitrary state, we relax this requirement and only ask that 
this property hold after a linear amount of time. 

Definition 7.3.3 We say that a behavior j3 is timely if for every pair of neighbors 



1. Normal Receipt of Messages: There is some constant c such that every every 
receive event that occurs at time greater than f3. start + c • n is normal. Also if 
aj is any normal receive event and a; is the send corresponding to aj, then aj 
occurs within 0{n) time after a;. 

2. Periodic Free Events: Consider any t-suffix 7 of behavior f3. Then in 7 either 
a FREEM U] „ occurs within constant time, or a signal event occurs at u within 
0{n) time, or a signal event occurs at v within 0{n) time. 

3. Timely Message Delivery: Suppose aj is a safe SENDM U] „(ra) event in (3. 
Then after aj either a R,ECEIVEM U] „(ra) occurs within constant time, or a signal 



164 



event occurs at u within 0{n) time, or a signal event occurs at v within 0{n) 
time. 

4- Signals at a Node induce Signals at Neighbors: There is some constant c 
such that for every SlGNAL u event aj that occurs at time greater than f3.start-\-c-n 
there is a SlGNAL„ event that occurs in linear time before or after aj. 

Specifying Consistency 

Before we formalize the consistency property, we formalize the notion of a signal in- 
terval at a node in a behavior. 

Definition 7.3.4 Consider a behavior f3. A signal interval at node u in j3 is a con- 
tiguous subsequence of j3 that: 

• Begins with either the start of j3 or a SlGNAL u event and 

• Ends with either the first SlGNAL u event that occurs after the start of the interval 
or (if there is no such SlGNAL u event) the interval ends with the end of f3. In 
the latter case we call the signal interval a final interval. 

Thus any behavior induces signal intervals at each node. We now specify the 
consistency condition by requiring a mating relation between signal intervals. However, 
the mating relation is weaker than the relation for Data Links because it does not 
require the prefix property. The third property is a little tricky. Consider Figure 
7.4. Suppose the send at v is safe and is followed by a free action at v that is in the 
same signal interval at v. Then we require that the sent message is delivered and the 
delivery event occurs between the send and the free events. In essence, this states that 
all messages (except possibly the last message) sent safely in a signal interval at v are 
delivered at u. 

Definition 7.3.5 We say that a behavior j3 satisfies the consistency property if for 
every pair of neighbors u,v there is a symmetric relation (called a mating relation) 
between signal intervals at u and v such that: 
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Figure 7.4: Successful sending of messages: between the sending of a message and the free action that 
indicates that the next message can be sent, either the message is delivered or a signal occurs at the sender. 

1. At most one mate: A signal interval at u can be mated to at most one signal 
interval at v. 

2. Signal Intervals that communicate are mates: Let a*, be any normal receive 
event at u from v in j3 and let a m be the send event corresponding to a*.. Then the 
signal interval at u containing a*, is mated to the signal interval at v containing 



3. Successful Sending of Messages: Consider any safe SENDM„ ]U (ra) event aj 
and a later FREEM„ ]U event that occur in a signal interval at v. Then between 
these two events in j3 there must be a normal RECEIVEM„ ]U (ra) event a*, such 
that aj is the send corresponding to a*.. 

4- Mating of Final Signal Intervals: A final interval at u can only mate to a 
final interval at v. 

5. Mating Relation Preserves Temporal Ordering: Suppose a signal interval 
S u at u is mated to a signal interval S v at v and a signal interval S' u at u is mated 
to a signal interval S^ at v. Then if S^ occurs later than S u then S^ occurs later 
than S v . 

Suppose S u and S v are signal intervals at u and v respectively that are mates. No- 
tice that as compared to a Data Link specification, we have weakened the requirements 
for a mating relation: we no longer require that the sequence of message received by v 
from u is a prefix of the sequence sent from u to v. However, if all received messages 
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are normal and all sent messages are safe, then the third consistency condition does 
imply the prefix property. The third and fourth consistency conditions also imply that 
if S v is a final interval, and if all received messages are normal and all sent messages 
are safe, then the two sequences are identical. 

The reader may wonder whether it is sufficient to specify consistent behavior only 
in final intervals. In that case, the consistency condition is much simpler. However, if 
the stabilizing reset protocol is to be used as a tool to build other stabilizing protocols 
(which is what we do in the next two chapters), then we claim that the reset protocol 
must make some guarantees during non-final intervals. A typical user of the reset 
protocol will be constantly doing some form of checking (see Chapter 8 for example) 
and will make a reset request if the checking detects a violation. But if the reset 
protocols exhibits arbitrary behavior during non-final intervals, then the user may 
always detect violations. These in turn lead to continuous reset requests which prevent 
the forming of a final interval. In other words, final intervals are only guaranteed if 
the user stops making reset requests; but users may only stop making reset requests if 
the reset protocol guarantees some form of consistency during non-final intervals. As 
another example, some user protocols may periodically make reset requests to start a 
new phase of the protocols. Such protocols (Chapter 9 describes an example of such 
a protocol) never stop making reset requests! 

Specifying Causality 

A causal behavior satisfies two intuitive conditions: that signal events are only caused 
by reset requests and reset requests result in signal events. Ideally, any signal event 
must be preceded by a reset request that occurs a linear amount of time before the 
signal event. Notice that this guarantees that all signal events will disappear in linear 
time after the last reset request in a behavior. However, because the reset protocol can 
start in an arbitrary state, we relax this requirement and only ask that this property 
hold after a bounded amount of time. But a causal behavior should also ensure the flip 
side of the above condition: reset requests should result in signal events. A protocol 
that simply ignored reset requests would be useless. We specify the second condition 
by requiring that a reset request at a node u is followed in linear time by a signal event 
at node u. 

Definition 7.3.6 We say that a behavior j3 is causal if: 
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1. There is some constant c such that every signal event a*, that occurs at time 
greater than f3. start + c • n is preceded by a request event aj that occurs in linear 
time before a*.. 

2. A SlGNAL u event occurs within linear time after any Request u event. 

We are now ready to describe the behaviors that should be produced by a reset 
protocol. 

Definition 7.3.7 We define the reset problem RP to be the behaviors f3 that are timely, 
consistent and causal. 

The following lemma is useful because it tells us that every suffix of a behavior in 
RP is also in RP. 

Lemma 7.3.8 // a behavior f3 is in RP, then any t-suffix of behavior f3 is in RP. 

Proof: The proof consists mostly of looking at each property in the definition of RP 
and showing that if a suffix of f3 does not have the property then neither does f3. The 
only tricky case is the property that all messages received after at most 0{n) time 
from the start of a behavior are normal. However, this can be deduced from the fact 
that the send event corresponding to a normal receive event in f3 occurs at most 0{n) 
time before the receive event. | 



7.3.4 Alternative Specifications of Reset Problem 

Traditional definitions (i.e., [AAG87]) of a reset service define the correctness of a reset 
service in terms of the states of protocol P, the user of the reset service. Our definition 
is more modular because it focuses on the behavior of the reset subsystem without any 
reference to the states of the users of the reset service. 

Next, one might like a reset protocol to satisfy a much stronger consistency property 
than the one specified above. In the stronger condition, the mating relation between 
signal intervals would also be transitive.. For instance, suppose u, v and w are con- 
nected in a cycle such that u,v, and v,w, and w,u are all neighbors. Also suppose 
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that a signal interval S u at u mates to a signal interval S v at v, S v mates to some S w at 
w, and S w mates to some interval S^ at u. Then our consistency specification allows 
S u to be different from S^. However, transitivity requires that S u = S^. The stronger 
condition seems to capture the essence of network synchronization in that the signal 
events provide consistent time points across the entire network. However, we show 
in the appendix that our reset protocol (and the reset protocol of [AAG87]) does not 
satisfy the stronger condition. The applications in this thesis do not need the stronger 
condition. 

Note that the weaker condition does imply that there is a transitive mating relation 
between final signal intervals at all nodes in the network. 

Also the specification of the behaviors of a stabilizing reset protocol is complicated 
by anomalies that occur at the beginning of the behavior such as abnormal messages 
and signals that are not "caused" by any requests. Given this difficulty, we might be 
tempted to specify the reset problem using suffixes of the behaviors of a non-stabilizing 
reset protocol. This results in a more elegant definition. However, if we choose that 
definition, then we do not know any reasonably simple proof technique to show that a 
protocol stabilizes to behaviors that are suffixes of a specified set of behaviors. 2 



7.4 Overview of the solution 

In this section, we will give an overview of a stabilizing reset protocol. Our solution 
basically consists of stabilizing the reset protocol of [AAG87] using the method of 
Chapter 5. In the first subsection, we describe a simple reset protocol that is not 
stabilizing. In the next subsection, the problems of the simple reset protocol are used 
to motivate the main ideas behind our reset protocol. Next, we give an overview of 
the code. Finally, we end this section with an example execution of our reset protocol. 
The next section contains a formal description of our reset protocol. 



2 Such proofs seem to involve showing that every initial state of an automaton A is a reachable state 
of another automaton B. But familiar inductive proof techniques (such as invariant arguments, progress 
metrics etc.) do not seem to suffice to show that one state is reachable from another state. 
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7.4.1 Problems with a Simple Reset Protocol 

Amazingly, the consistency condition for normal reset behaviors can be guaranteed 
by the following Simple Reset Protocol (SRP). This is a non-stabilizing protocol. So 
assume that this protocol begins in a state where all queues and links do not contain 
any messages. 

In the absence of reset requests, nodes are in the so called Ready state. In this 
state, any message sent from (say u to v) is queued by u in an outbound queue for 
the link. When the message arrives at v, it is stored in a buffer which we will call 
buffer v [u] . Eventually, the message stored in buffer v [u] is delivered to the user at v. 
Thus in the absence of reset requests, sent messages are delivered in FIFO order. In 
the following two paragraph description of the protocol, when we say "node m" we 
mean "the reset protocol at node m". 

When node u receives a reset request (Request u ), the reset protocol at u sends an 
ABORT packet to all its neighbors and goes into a special Abort mode where it waits 
until it gets an ABORT packet from all its neighbors. It does so by setting a boolean flag 
acfc u [-y] for all neighbors v to indicate that it is expecting an ABORT packet from v. A 
node u that receives an ABORT packet in the Ready mode behaves in almost the same 
way as a node that receives a reset request - i.e., u sends an ABORT packet to all its 
neighbors. However, in this case u sets acfc u [-y] for all neighbors v except the neighbor 
v from which it received the ABORT packet. If u receives an ABORT packet from 
neighbor v and acfc u [-y] = true then u sets acfc u [-y] = false. As soon as acfc u [-y] = false 
for all neighbors v, u returns to the Ready mode and performs a SlGNAL u action. 

The consistency condition is guaranteed by two additional rules. When an ABORT 
packet from u arrives at v, buffer v [u] is emptied. Second, no packet in buffer v [u] is 
delivered while v is in Abort mode and until v has performed any outstanding SlGNAL„ 
action. Intuitively, sending an ABORT packet on a link and waiting until another 
ABORT packet returns on the link effectively "flushes" any old messages that were sent 
in previous signal intervals. Essentially, we send an ABORT packet between any two 
SIGNAL events at a node and we delay delivery of packets until an outstanding signal 
has been performed. This ensures that a signal interval at u "communicates" or mates 
with at most one signal interval in v. The Simple Reset Protocol (SRP) is similar to the 
Chandy-Lamport snapshot protocol [CL85] with ABORT packets replacing "markers". 

The problem with SRP is that it can easily be placed in a state where it never 
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Figure 7.5: Two "bad" states of the Simple Reset protocol. The black dot before an edge indicates that 
a node is waiting for an Abort on that edge. 

terminates - i.e., violates the causality property. Consider the topology shown in 
Figure 7.5. Suppose in the initial state there is an ABORT packet in the channel from 
u to v, and u is waiting for an ABORT from v (but not from w) before u can exit from 
the Abort mode. Nodes v and w are in the Ready mode, and all other links are empty. 
Next, assume that the ABORT packet arrives at v which causes v to enter ABORT state 
and send an ABORT packet to u and w. Assume that the ABORT packet sent to u 
arrives quickly and causes u to return to Ready mode. Notice that by the rules of 
the protocol, v does not expect an ABORT from u. The resulting state is shown in 
Figure 7.5. By symmetry, it is clear that the execution can remain in a cycle of states 
where an ABORT packet continuously cycles through the network. 

Now, since SRP is a non-stabilizing reset protocol, it may be possible to show that 
(after proper initialization) SRP will never enter such a "bad state". If the network is 
static, then this is indeed true. However, if network can have links that can fail and 
recover (a so called "dynamic network") then a series of link failures and recoveries 
can leave SRP in the "bad" state depicted in Figure 7.5. 

More importantly for our purposes, SRP does not seem like a suitable point of 
departure for constructing a stabilizing reset protocol. Specifically, there does not ap- 
pear any easy way to make SRP locally checkable. For instance, consider the topology 
of Figure 7.5. In a "good" state of SRP, if there is an ABORT packet in the cycle, 
there must be at least one other ABORT packet in the cycle, which is travelling in the 
"opposite" direction. This seems hard to check locally, even with the addition of a 
small amount of state. Instead, our point of departure is the AAG reset protocol of 
[AAG87]. This protocol works in dynamic networks and can be made locally checkable 
and correctable. We describe some more details of how the AAG protocol works and 
why it avoids the problems of the Simple Reset Protocol in the appendix. The ap- 
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pendix also contains a description of the changes we made to make the AAG protocol 
stabilizing. For the rest of this chapter, we will describe our reset protocol that is 
based on (but not identical to) the AAG protocol. 

7.4.2 The basic idea behind the Reset protocol 

Our reset protocol uses essentially the same idea as SRP for ensuring consistency - 
once again consistency is ensured by using ABORT packets to "flush" old packets (that 
were sent in a previous signal interval) from links. However, our protocol is much more 
conservative about allowing a node to to return to Ready mode. 

A global (but approximate) summary is as follows. The protocol responds to reset 
requests in three phases. In the first phase, ABORT packets are sent out from nodes 
that receive reset requests. Nodes that receive ABORT packets and are in the Ready 
mode, broadcast ABORT packets to their neighbors. Thus the first phase consists of a 
wave of abort packets that spreads outwards from a reset request much the same way 
as in SRP. However, in our protocol the abort waves create an abort forest as they 
spread outwards. Consider any node u. Node w's parent in the abort forest is the 
neighbor from whom u last received an ABORT packet that caused u to leave Ready 
mode. If node u left Ready mode because u received a reset request, then u has no 
parent and u is considered a root in the abort forest. Notice that the abort forest is a 
set of ad hoc abort trees that are created while reset requests are being processed. 

In the second phase, a node sends an ack to its parent when the subtree rooted at 
that node has stopped expanding. Thus, in the second phase a wave of acks flow up 
the abort trees to the roots. The first and second phases work in much the same way 
as the Dijkstra-Scholten termination detection algorithm [DS80]. As in the Dijkstra 
Scholten algorithm, some nodes may be in the first phase (forward propagation of 
ABORT packets) while other nodes may be in the second phase (sending acks up to 
parents). 

What distinguishes our protocol from the Dijkstra-Scholten protocol is that there 
is a third phase in our protocol. When a root of an abort tree receives acks from all 
its children, it starts a wave of READY packets that flows down the tree and allows all 
nodes in the tree to return to the Ready mode. Thus the crucial difference between 
SRP and the AAG protocol is as follows. In the former, nodes return to Ready mode 
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after communicating with their neighbors. However, in our protocol a node u returns 
to Ready mode only after the root of the abort tree (that u is part of) knows that the 
abort wave has stopped propagating, and has informed u of this fact using a READY 
packet. Thus in the SRP protocol a node returns to Ready mode in constant time after 
it receives an ABORT packet. In our protocol a node may have to wait 0{n) time to 
return to Ready mode (potentially three worst case delays across the network, one for 
each phase). 

The reader may wonder why three phases are needed. The appendix provides some 
intuition by describing why three phases were used in the original AAG protocol to 
avoid the problems of the Simple Reset Protocol. 

7.4.3 Overview of the Code 

The code of our reset protocol works as follows. 

Each node u has three interesting variables. First each node has a mode, mode u 
which is one of Ready, Abort or Converge. Ready, is the "normal" mode of a node when 
it is not processing a reset request. If the mode of u is Abort or Converge, then this 
means that u is processing an abort request. 

Next, each node u has an ack bit acfc u [-y] for each neighbor. If this bit is set, it 
indicates that u is waiting for an ack from v. (Unlike the Simple Reset Protocol, 
our protocol uses explicit ACK packets.) Finally, u has a parent variable parent u that 
points to the neighbor from which u received the reset request that it is processing. If 
u received a reset request through a Request u action (i.e., a reset request directly at 
u itself) then u sets parent u = nil. If parent u = nil, we will say that u is a root of an 
abort tree. A list of variables used by the code is shown in Figure 7.6. 

In our code, the mode of a node is characterized by the state of the other variables 
at the node. Thus the mode of node u is really a derived variable: 



mode{u) 



Abort, if 3v such that ac£ u [/y] = true 

Converge, if parent u ^ nil and \/v (ack u [v] = false) 
Ready, if parent u = nil and \/v (ack u [v] = false) 



Notice that unlike the Simple Reset Protocol, we have a third mode called Converge. 
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State 

ack u [v] : Boolean Flag for each neighbor v of u 

parent u : Either one of the neighbors of u or nil 

(Listy,: integer in the range 0..n', where n' is an upper bound on the number of nodes in the graph. 

We also assume that for any (Abort, d) packet, d is an integer in the range 0..n'. 

signalbit u : Boolean flag (*used to remember to do a signal event*) 

free u [v]: Boolean Flag for each neighbor v of u (*says whether link to v is ready to accept a packet*) 

freem u [v\: Boolean Flag for each neighbor v of u (*says whether v is ready to accept a message*) 

queue u [v]: queue of size 5 consisting of packets drawn from Pdata (*outbound queue for link to v *) 

buffer u [v] : queue of size 1 that can contain a S-message only (*inbound message queue for link from v *) 

The following piece of code is used as a macro to propagate Abort packets: 

Propagate,, (w, dist) = 
parent u := w 
disty, := dist 
For all neighbors v of u do (*broadcast abort packets to neighbors*) 

acfc m ['u] := true 

enqueue (Abort, dist+ 1) on queue u [v] 



Figure 7.6: Variables and a Useful Macro 
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If mode{u) = Converge, this means that u has received acks from all its children and 
is waiting for a READY packet from its parent. 

There are also three interesting packets used by the protocol: abort packets (that 
are encoded as by tuples of the form (ABORT, d) where d is a integer representing 
distance from the root), ack packets (that are encoded simply as Ack), and ready 
packets (that are encoded simply as READY). 

The protocol will deadlock if the protocol is placed in an initial state in which the 
parent pointers form a cycle. In order to be able to locally check that the subgraph 
induced by the parent pointers is acyclic, we maintain a distance variable at each node, 
such that a node's distance is one greater than that of its parent. Specifically, distance 
is initialized to upon a reset request, and its accumulated value is appended to any 
abort packets. Thus we encode an abort packet as a tuple (ABORT, d), where d is a 
distance. 

The code that implements most of the reset protocol is shown in Figure 7.7. 
For convenience, we have marked certain transitions in the figure with the labels 
VR, VA,DA, IA,FA,RA and RR. 

A VR (for Valid Request) transition is a reset request that causes a node to change 
its mode to Abort. A VA (for Valid Abort) transition is the receipt of an (ABORT, d) 
packet with valid distance field that causes a node to change its mode to Abort. A 
DA (for Distance Invalid Abort) transition is the receipt of an (ABORT, d) packet such 
that the distance field d is at the maximum value and such that its receipt causes a 
node to change its mode to Abort. 

An IA (for Invalid Abort) transition is the receipt of an (ABORT,*) packet that 
does not cause a node to change its mode to Abort. A FA (for Final Ack) transition 
is the receipt of an ACK packet that causes say node u to send an ACK packet to its 
parent. It is not hard to see that the ack that was received must have been the last 
ack that node u was waiting for. A RA (for Root Ack) transition is the receipt of an 
ACK packet at a root node that causes the root node to change its mode to Ready. A 
RR (for Regular Ready) transition is the receipt of an (READY) packet at a node that 
causes the node to change its mode to Ready. 

Refer to these labels in Figure 7.7 when following the description below. 

The code in Figure 7.7 uses a small macro called PROPAGATE u (-y, d). This proce- 
dure is used to broadcast (ABORT, *) packets and is shown in Figure 7.6. The first 
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Actions to Execute Reset Protocol 



Request^ (*receive a reset request from user at node u, input action*) 

If mode(u) = Ready then 
VR: PROPAGATE m (reiZ, 0) (*broadcast abort packets to all neighbors and set parent u = niP') 

Receive^ jM (Abort, dist) (*receive abort packet from neighbor v, input action*) 

If buffer u [v] is not empty then 

Empty buffer u [v] (*flush any old message in buffer*) 

Enqueue S-Ack in queue u [v] (*send a message ack allowing v to send another message*) 

VA: If mode(u) = Ready and dist < n' then 

PROPAGATE m (i;, dist) (*broadcast abort packets to all neighbors*) 

DA Elself mode(u) = Ready and dist = n' then (*distance at max value; become a root and ack*) 

Propagate m (7uZ, 0) 

Enqueue Ack in queue u [v] (*send back an Ack as well*) 

IA: Else enqueue Ack in queue u [v] (*send back an Ack*) 

Receive,, jM (Ack) (*receive ack packet from neighbor v, input action*) 

If ack u [v] = true then 
acfc m ['u] := false 

FA: If mode(u) = Converge then (*no acks outstanding and not a root?*) 

enqueue Ack in queue u [parent u ] (*ack parent*) 

RA: Else if mode(u) = Ready then (*no acks outstanding and a root?*) 

signalbit u := true (*remember to do a signal later*) 
For all neighbors x of u do 

Enqueue Ready in queue u [x] (*broadcast Ready*) 

Receive^ jM (Ready) (*receive Ready packet from neighbor v, input action*) 

RR: If parent u = v and mode(u) = Converge then (*Ready expected and from parent?*) 

parent u := nil (*return to Ready mode*) 

signalbit u := true (*remember to do a signal later*) 

For all neighbors x of u do do 

Enqueue Ready in queue u [x] (*broadcast Ready*) 

Signal^ (*deliver reset signal to user at u*) 

Preconditions: signalbit u = true 
Effects: signalbit u = false 



Figure 7.7: Actions at node u to execute reset protocol functions with respect to any 
neighbor v. 176 



parameter to the procedure specifies that v is the new parent of u and the second 
parameter specifies that d is the distance from u to root of w's abort tree. The abort 
packets sent out as a result of this procedure (see Figure 7.6) will carry a distance 
value of d + 1. We add the distance variable to abort packets in order to make the 
protocol locally checkable. 

When a reset request is made at some node u while in Ready mode ( VR), u changes 
its mode to Abort, broadcasts an ABORT packet to all its neighbors, and sets its ack 
bits to true for all neighbors v. Node u then waits until all the neighboring nodes send 
back an ack packet. If node u receives an abort packet while in Ready mode ( V^4), it 
marks the neighbor from which the packet arrived as its parent, broadcasts ABORT, 
and waits for ACK packets to be received from all its neighbors. We also add a check 
to see whether the distance in the ABORT packet is less than n' , which is the maximum 
value of the distance variable at a node. In linear time after all local predicates hold, 
this condition will always hold. However, this check helps to ensure that the local 
predicates remain closed during initial periods. If the distance check fails, (DA) node 
u becomes a root and sends out abort packets just as if it received a reset request; 
however, node u also sends back an ACK to the ABORT packet it received. Finally, if 
the ABORT packet is received by a node not in Ready mode (IA), an ACK is sent back 
immediately. 

When node u receives an ACK from v it sets the ack bit for v to false. The action 
of node u when it receives the last anticipated ack depends on the value of u's parent. 
If m's parent is not nil (FA), w's mode is changed to Converge, and an ACK is sent to 
the parent. Notice that since mode u is a derived variable, mode u = Converge when 
acfc u [-y] = false for all neighbors v of u. If u is a root (RA), u changes its mode to 
Ready (by setting parent u = nil), and broadcasts a ready packet to all neighbors. If 
node u gets a READY packet from its parent while in Converge mode (RR), then u 
changes its mode to Ready (by setting parent u = nil), and broadcasts a ready packet 
to all neighbors. 

Finally, whenever node u changes its mode from either Abort or Converge to Ready 
it sets a flag (which we call signalbit u ) to remind itself to later output a signal event. 
In other words, a SlGNAL u event is enabled whenever signalbit u is set; naturally, when 
this event occurs, the flag is cleared. For convenience, we introduce the following 
definition. 
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Definition 7.4.1 We say that node u has a status of on whenever signalbit u = false 
and mode u = Ready, and off otherwise. 

The code that implements the sending and receiving of S-messages is shown in 
Figure 7.8. 

If the status of u is on, u relays S-messages between the user and the network. More 
specifically, when a SENDM U] „(ra) event occurs and node u has status on, u queues m on 
an outbound queue (called queue u [v] as in previous chapters) that contains packets to 
be sent to v. This queue can also contain an abort, ack or ready packet. Eventually, 
packets in this outbound queue are delivered to neighbor v. When v receives a S- 
message m from v, v queues m in an inbound message buffer called buffer v [u] . Later, 
if -y's status is on, it will do a R,ECEIVEM U] „(ra) event and remove m from the buffer 
and deliver it to the user. See Figure 7.9 for a sketch of the inbound and outbound 
queues for a link. 

If m's status is not on, u discards S messages input by the user through SENDM U] „ 
actions. Also, u, will not do a R,ECEIVEM„ ]U action unless its status is on. Recall 
that all S-messages from a neighbor v are queued on buffer u [v]. In order to separate 
packets that are sent during separate signal intervals at v, we use the same rule as 
the Simple Reset Protocol. When a abort packet arrives from neighbor v, buffer u [v] 
is emptied. Once again, this simple rule is really the key to ensuring the conditions 
required to satisfy the consistency property. 

Note that buffer u [v] can store at most one message. Clearly, if user messages are 
not to be dropped, we have to rely on the FREEM action as a form of "flow control". 
Our scheme is as follows. We require that there is at most one S-message in transit 
from u to v. Thus u keeps a variable freem u [v]. Whenever v delivers (or destroys) a 
S-message from u, v sends a S-Ack back to u. When u receives a S-Ack from v, u sets 
freem u [v] to true. This enables the FREEM u [-y] action, which tells the user at u that 
it can safely send a message to v. Thus all we are doing is using a S-Ack message to 
provide feedback to the sender that the buffer at the other end is empty. 

7.4.4 Example 

Figure 7.10 describes a sample execution of the reset protocol for a simple topology 
consisting of four node u,v, w and x connected as shown in the figure. The figure 
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Actions to send and receive S-messages 



SENDM mj „(ra), m£S (*input action by which user sends a message*) 

If mode (u) = Ready and signalbit u = false and freem u [v] = true then 
(* accept message only if mode is ready, no outstanding signals, and message flow control says OK*) 



enqueue m in queue u [v\ 
freem u [v] = false 

Send„^(p) 
Preconditions: 

p is the head of queue u [v] 

free u [v] = true 
Effects: 

Delete p from queue u [v] 

free n [v] := false 

r REE m „ 

free u [v] := true 

RECEiVE^ jM (m), ro£E 
Enqueue m in buffer u [v] 

Receive,, jm (E-Ack), 
freem u [v\ = true 

RECEiVEM^ jM (m), ro£E 
Preconditions: 

m is the head of buffer u [v] 

mode (u) = Ready and signalbit u = false 
Effects: 

Remove m from buffer u [v] 

Enqueue S-Ack in queue u [v] 



(*enqueue m in outbound queue to v *) 
(*inhibit delivery of free event to user*) 

(*output action to send a packet on UDL link to v*) 

(*p is head of outbound queue for link*) 



(*input action from link to v to say it is free*) 

(*input action to receive a user message from v*) 
(*store m in inbound buffer for link*) 

(*input action to receive a S- message ack from v*) 
(*record that v is ready to accept a new message*) 

(*output action to deliver a user message from v to the user*) 
(*mode is ready and no outstanding signals*) 



(*send a message ack back to ■;;*) 
(*output action to tell user that it can safely send another message to v*) 



Freem Mj ^ 

Preconditions: 

freem u [v] = true and signalbit u = false and mode(u) = Ready 
Effects: 

None 



Figure 7.8: Actions at a node u to send and receive user messages to and from any neighbor v of u. 
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Message sent from u to v 



Message received from u at v 




queue u [v], outbound packet queue 
can contain protocol packets and a 
user message 




buffer [u], inbound message buffer 

V 



Figure 7.9: Sketch of the inbound and outbound queues for a link. 

describes seven snapshots (F1-F7) taken during this sample execution. 

The execution begins (Fl) with nodes u and w receiving a snapshot request. Node 
u and w go into Abort mode and send abort packets to their neighbors. In F2, the 
abort packet from u has arrived at v. Next, u and w receive each other's abort packet 
and (F3) send back acks to each other. We assume that the abort from w to x travels 
slowly which allows the abort packet sent from v to arrive at x earlier (F3). Thus in 
F3, the abort tree rooted at u (which is shown using dotted lines) consists of u, v and 
x. 

Next, x sends an abort to both v and w which are acked immediately. Then x 
receives the "slow" abort from w and sends back an ack. Once x gets an ack back 
from both v and w, x sends back an ack to its parent v (F4). When v receives this 
ack, it sends back an ack to its parent u (F5). Finally, in F6, u sends a ready packet 
down to v and then (F6), v sends a ready packet to x. Note that the ready packet 
destroys the abort tree as it travels down the tree. 



7.5 Reset Automaton 

In describing the reset code, we use the following notation. As in any node automaton 
there is a outbound queue of packets (queue u [v]) at any node u for every neighbor v 
of u. As usual, queue u [v] consists of packet waiting to be sent on C UiV . 

Each node automaton N u is specified given in Figures 7.6, 7.7, and 7.8. The code 
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Figure 7.10: A sample execution for the reset protocol described in terms of seven snapshots. 
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uses the variables and macro specified in Figure 7.6. The piece of code that deals with 
implementing the main part of the reset protocol is described in Figure 7.7. The piece 
of code that deals with sending and receiving S-messages is described in Figure 7.8. 



7.6 Reset Protocol is Locally Checkable and Cor- 
rectable 

Let G be the topology graph that models the topology of the network on which the 
reset protocol works. 

Lemma 7.6.1 N = {N u ,u £ G} is a set of node automata for G with Pdata = S U 
{(Abort, *), Ack, Ready}. 

Proof: Simple checking of the definitions given earlier. | 

7.6.1 Overview of Predicates 

Next, we prove that the protocol is locally checkable, by describing a set of predicates 
in Figure 7.14 and showing that these predicates are closed. The description of the 
predicates uses the shorthand notation shown in Figure 7.13. Please refer to both 
these figures during the following discussion. 

Recall that Q u ,v is the queue corresponding to the single packet that can be stored in 
channel C UiV . For convenience, we define xqueue u [v] (i.e., the extended queue between 
u and v) as the queue formed by the concatenation of Q u ,v and queue u [v]. Thus in 
Figure 7.9, the extended queue between u and v is the queue formed by concatenating 
the outbound link queue and the link itself. Note that we do not include the inbound 
message buffer! If Q u ,v ^ nil, we assume that Q u ,v is the head of xqueue u [v]. 

We informally describe the predicates. The first two predicates A and B deal with 
the ack flag acfc u [-y] at a node u. Intuitively, this bit is set if u expects an ack from v. 
A states that if u is expecting an ack from v then one of three possibilities must be 
true: either there is an abort packet in transit from u to v (Case 1 in Figure 7.11), OR 
v has received the abort packet and has chosen u as its parent (Case 2 in Figure 7.11), 
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V_y v mode(v) = abort (^_J v \_J 

and v's parentis u 

Case 1 Case 2 Case 3 

Figure 7.11: The Three Cases for the first and second predicates. The black dot before an edge indicates 
that a node is waiting for an Ack on that edge. 

OR there is an ack packet in transit from v to u (Case 3 in Figure 7.11). Predicate 
B says that at most one of these three possibilities can be true at the same time. In 
some sense, A and B govern the first two phases of the reset protocol 

Consider Figure 7.10. In that example execution, the first case is illustrated in F2, 
the second case in F3 and F4, and the third in F5. 

The next two predicates C and T> govern the second and third phases of the reset 
protocol. They deal with the parent u variable at a node u. Intuitively, parent u = v 
if v is the parent of u in the abort tree. C states that if v is the parent of u and the 
the mode of u is Converge then one of three possibilities must be true: either there is 
an ack packet in transit from u to v (Case 1 in Figure 7.12), OR v has received the 
ack packet and has cleared its ack bit for u but has not changed its mode to Ready 
(Case 2 in Figure 7.12), OR there is a ready packet in transit from v to u (Case 3 in 
Figure 7.12). Predicate T> says that if v is w's parent, then at most one of these three 
possibilities can be true at the same time. 

As an example, consider nodes x and v in Figure 7.10, where v is the parent of x. 
In that example execution, the first case is illustrated in F4, the second in F5 and F6, 
and the third in F7. 

The next predicate 8 is crucial to the proof of termination of the protocol. It states 
that the distance of a child in the abort tree is one more than the distance of its parent. 
The only exception to this is if the parent has "abdicated" by sending a ready packet 
that is currently in transit to the child. Essentially, this predicate shows that abort 
trees are acyclic and have a maximum height of n, the number of nodes; the proof of 

183 



© 



V 



ACK 



O. 



Case 1 



mode(v) not equal to ready 

O 



o- 



Case 2 



O 



READY 



Ou 
Case 3 



Figure 7.12: The Three Cases for the third and fourth predicates. Node v is the parent of Node u in all 
three cases. 

termination will essentially consist of induction on the height of the abort trees. The 
predicate T is a supporting predicate required to prove that E is closed. It states that 
if u sends an abort to v then the distance in the abort packet is one more than w's 
current distance. 

Predicate Q is is again a supporting predicate required to prove that some of the 
other predicates are closed. Suppose there is a packet p, that is either a ready packet 
or a user message or a E-Ack, in transit from u to v. Then p must have been sent 
when u was in the Ready mode. Thus, it must be that either u is still in the Ready 
mode or that u has gone into Abort mode since the time p was sent. But in the latter 
case there must be an abort packet behind p in transit from u to v. 

Predicate "H governs the flow control scheme that ensures at most one message in 
transit from u to v. If freem u [v] is true then the FREEM U] „ action is enabled and so 
there must be no message in transit from u to v and also no message acks (S-Ack's) 
in the reverse direction. On the other hand, if freem u [v] is false then either a message 
is in transit from m to d, or a message ack is in transit in the reverse direction, but not 
both. 

Predicate Q is the reason why we can get away with an outbound queue size for 
links (e.g., the size of queue u [v]) of 5. It states that the outbound queue (even after 
concatenation with the channel queue) can contain at most one packet of each type: 
abort, ack, and ready. Since "H tells us we can have at most one S-message and at 
most one S-Ack, it means that a queue size of 5 is sufficient. 
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The following shorthand is used to specify local predicates: 

xqueue u [v] (the extended queue between u and v) is the queue 
formed by the concatenation of Q U)V and queue u [v]. 



Al(u,v 
A2(u,v 
A3(u, v 
Cl(u,v 
C2(u,v 
CS(u, v 



(Abort,*) in xqueue u [v] 

mode(v) = Abort and parent^ = u 

Ack in xqueue v [u] 

Ack in xqueue u [v] 

ack„[u\ = false and mode(v) ^ Ready 

Ready in xqueue v [u] 



Figure 7.13: Shorthand Used to Define Local Predicates 



We now show that the conjunction of these predicates is a closed predicate. The 
proofs, which we relegate to the appendix, consist basically of rigorous (and somewhat 
tedious) case-analysis. 

7.6.2 Proving that the Local Predicates of the Reset Protocol 
are Closed 

First, the the local predicates of Figure 7.14 are specific to a directed link (u,v). Recall 
that for local checkability we need exactly one local predicate for each edge. Hence: 

Definition 7.6.2 We let G UiV be the intersection of the local predicates in Figure 7.14 

for any edge (u,v). For any edge (u,v), we let L u>v be the intersection of G UiV and 
n 

Definition 7.6.3 Let L u>v be the local predicate defined in Definition 7.6.2. Let £ be 
the link predicate set containing L u>v for every (u,v) in G and let L = Conj(C). 
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A: ack u [v\ = true iff 

one of A\(u, v), A2(u, v), or A3(u, v) holds. 

B: At most one of A\(u, v), A2(u, v), or A3(u, v) holds. 

C: parent u = v implies 

mode(u) = Converge iff 

one of C\(u, v), C2(u, v), or C3(u, v) holds. 

T>: parent u = v implies 

at most one of C\(u, v), C2(u,v), or C3(u, v) holds. 

£: parent u = v implies 

one of the following holds: 

distu = dist„ + 1 and mode(v) ^ Ready OR 
C3(u,v). 

T: If xqueue u [v] contains a (Abort, d) packet then d = distu + 1. 

Q: If p in xqueue u [v] and p = Ready or p is a S-message or p = S-Ack then 
Either mode(u) = Ready 
Or there is an Abort in xqueue u [v] after p. 

7i: Let M U)V denotes the concatenation of xqueue u [v] , and buffer^ [u] . Then: 
If freem u [v] = true then 

There is no S-message in M U)V and no S-Ack in xqueue v [u] 
Else one of the following holds: 

There is exactly one S-message in M U)V OR 

There is exactly one S-Ack in xqueue v [u] 

Q: xqueue u [v] contains at most one (Abort,*), Ack, or Ready packet. 



Figure 7.14: Reset Protocol: Local Predicates for edge (u,v). Refer to the code given in Figure 7.13 for 
an explanation of the shorthand used. 
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The proof, which is in the appendix, consists of showing that each L u>v is a closed 
local predicate. 

Lemma 7.6.4 For a leader edge (u,v) and transition (s,a, s') ofTZ, if s satisfies L u>v , 
then s' satisfies L u>v . 

Proof: By lemmas, D.1.4, D.1.5, D.1.6, D.1.7, D.1.8, and D.1.3 in Section D.l of the 
appendix. The predicates are closed because of the code in [AAG87] and the heuristic 
of removing unexpected packet transitions. 

We quickly sketch what is involved in such a proof. Consider predicate A as 
sketched in Figure 7.11. We essentially consider all states that satisfy this local predi- 
cate and show that no transition can cause this predicate not to hold in the next state. 
For example, consider Figure 7.11. If u is not expecting an ack from v in a state, then 
the only transitions that can cause this to happen is if u gets a reset request or an 
ABORT packet while in Ready mode. But this causes u to send an an ABORT packet 
to v which leaves us in Case 1 of the predicate. 

Rather than consider all possible transitions, we save some effort by first identifying 
the transitions that can affect key variables. Then we focus our attention on such 
transitions. This method is described in the appendix. | 

The following theorem is immediate from the last lemma. 
Theorem 7.6.5 The reset automaton 7Z can be locally checked for L using C 

7.6.3 Reset Protocol is Locally Correctable 

Consider the function / defined in Figure 7.15. Let < be the trivial partial order such 
for all u,v,w,x in G, {u,v} ^t {w,x}. (i.e., no pair of neighbors is less than any other 
pair of neighbors.) We claim that / is a local reset function for 7Z with respect to £ 
and partial order <. 

Lemma 7.6.6 The function f defined in Figure 7.15 is a local reset function for net- 
work automaton 7Z with respect to link predicate set £ = {L u>v } and partial order 
<■ 
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Local Reset Function / applied to node u with respect to node v 

(*First simulate the receipt of an Ack message from v*) 
If acfc m ['u] = true then 
acfc m ['u] := false 
If mode(u) = Converge then 

enqueue Ack in queue u [parent u ] 
Else if mode(u) = Ready then 

signalbit u := true 

For all neighbors x of u do Enqueue Ready in queue u [x] 



(*Next simulate the receipt of an Ready message from v*) 
If parent u = v and mode (u) = Converge then 

parent u := nil 

signalbit u := true 

For all neighbors x of u do do 

Enqueue Ready in queue u [x] 

(*Finally correct parent u if it has not been done by code above*) 
(*and also clean up the message buffer and packet queue*) 
If parent u = v then parent u = nil 
Empty queue u [v] and buffer u [v] 
freem u [v\ = true 



Figure 7.15: Reset Protocol. Local Reset Function 
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Proof: Consider any state s of 7Z and any leader edge (u,v) of G: We show each of 
the properties required by a local correction function. 

• Correction: In the state f(s\u,v), it is easy to check that parent u ^ v, ack u [v] = 
false and queue u [v] and buffer u [v] are empty. Also freem u [v] = true. Similarly, 
in the state (f(s\v,u), it is easy to check that parent v ^ u, ack v [u] = false and 
queue v [u] and buffer v [u] are empty and freem v [u] = true. Thus it follows that 
(f(s\u,v), nil, nil, f(s\v,u)) £ L u>v . 

• Stability: We need to show the following fact for any neighbor w of. v. 

If (s\u,s\(u,v),s\(v,u),s\v) (E L u>v then (s\u,s\(u,v),s\(y,u),f(s\v,w)) (E L u>v . 

But if we look at the code for f(s\v,w) we see that f(s\v,w) is the same state 
that would occur at v in an execution in which the reset protocol starts in state 
s and consisting of the following sequence of actions: 

- A RECEIVE U , ] „(ACK) action (see Figure 7.7) followed by 

- A RECEIVE U , ] „(READY) action (see Figure 7.7) followed by 

- A hypothetical internal action that results in the execution of the last three 
lines of Figure 7.15 applied to node v with respect to node w. 

Now we know that the state at v resulting after the Receive^^Ack) and 
RECEIVE^^READY) events must satisfy the stability condition because we have 
already proved this in Lemma 7.6.5. 

Now consider the hypothetical internal action applied to v. Note that this in- 
ternal action can only change parent v from w to nil. Also a careful look will 
show that this internal action will never change the value of mode(v) since it is 
only applied after the simulated processing of a READY packet from w. Now the 
predicates described in L u>v only depend on the value of mode(v), ack v [u] and the 
predicate parent v = u. But none of these values are affected by the hypothetical 
internal action. Thus L u>v is unaffected by the hypothetical internal action. 

Thus the result of executing all three actions must satisfy the stability condition. 
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Lemma 7.6.7 Let f be the reset function described in Figure 7.15. Then 7Z is locally 
correctable to L using link predicate set C, reset function f, and partial order <. 

Proof: Follows from Lemma 7.6.6 and Lemma 7.6.5. | 

Theorem 7.6.8 7Z + stabilizes to the behaviors of 7Z(t p )\L in time t q , where t q and t p 
are constants. 

Proof: Follows directly from the Local Correction theorem, Theorem 5.4.3 and 
Lemma 7.6.7. Notice that height(<) = 1 since < is the trivial partial order. | 

7.7 The behaviors of a reset protocol after it sta- 
bilizes 

We have already shown that 7Z + stabilizes to the behavior of 7Z(t p )\L in constant time. 
Recall that 7Z(t p )\L is identical to the automaton 7Z except that the node and link 
delays are increased by a constant factor, and all link predicates hold. 

We now show that 7Z(t p )\L solves the reset problem RP. However, because the set 
of behaviors specified by RP remains unchanged after scaling by constant factors, it 
suffices to show that 1Z\L solves the reset problem RP. Thus in the remainder of this 
section and in the appendix, we show that every behavior j3 of 1Z\L is in RP. 

We relegate the proof of this theorem to the appendix. However, in this section, we 
intuitively explain why any behavior j3 of 1Z\L is timely, consistent, and causal. The 
intuition provided in this section should help the reader understand the proof in the 
appendix. 

In the proofs when we talk of a state s or an execution a, we mean a state or 
execution of 1Z\L. Recall that we defined a derived variable called status{u) which is 
on if mode{u) = Ready and signalbit{u) = false and off otherwise. From the code, it is 
easy to see that u will only send and receive messages when status{u) = on. 

The first important lemma is what we call the Termination Lemma. This states 
the following. Consider any node u. Assume that mode{u) ^ Ready in some state S{ of 
any execution a. The lemma states that mode{u) will change to Ready in 0{n) time 
after S{. 
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7.7.1 Why the Termination Lemma Works 

Let c be the worst-case time for a packet queued at node u to reach neighbor v. It's 
not hard to see that c is a constant because the outbound queue size at each node 
is at most 5 and because of the properties of a UDL. In the following, we will say 
that a node u is a root in some state s of the reset protocol if mode{u) = Abort and 
parent u = nil in state s. 

By assumption, mode{u) ^ Ready in state S{. Since mode{u) ^ Ready, u must 
either be a root or have a parent (say v) in state S{. By using invariants A and C 
repeatedly, we can obtain a chain of nodes starting with u such that each node is the 
parent of the previous node. Also the chain must either end with a READY packet 
or must end with a root node r such that mode{r) = Abort in S{. This is shown in 
Figure 7.16. We also know from invariant 8 that the distance of each node in the chain 
is one more than that of its parent. Thus no node can occur more than once in this 
chain and hence this chain can consist of at most n nodes. 

(The following is a more detailed argument that explains why each chain must end 
in a READY packet or a root r. If mode{u) = Abort, then either u is a root or u has 
a parent, say v. In the latter case by A, ack v [u] = true and so mode{y) = Abort. If 
mode{u) = Converge then u has a parent, say v. By C, either there is an ACK in 
transit from u to v (in which case by A, mode{y) = Abort), or mode{y) ^ Ready, or 
there is a READY packet in transit from v to u. Thus either u is a root, or u has a 
parent v such that mode{y) ^ Ready or there is a READY packet in transit from v to 
u. We now repeat this argument until we either arrive at a root node or a READY 
packet.) 

Consider the case where the chain ends with a READY packet. In this case, it is 
not hard to see that within 0{n) time after S{, a READY packet will reach u which will 
cause u to go into Ready mode. 

So consider the latter case where the chain ends with a root r. Suppose we can 
show that in 0{n) time after S{, there is an action aj in which r changes its mode to 
Ready and sends a READY packet to all its children. But if we can show this we are 
done in 0{n) time after aj using the arguments in the previous paragraph. So all we 
have to do is to show that in 0{n) time after S{, r will change its mode to Ready 

If the mode of r is Abort, then by definition r must have some neighbor v that it 
expects an ack from (i.e., acfc r [-y] = true). Now by invariant A, this means that (see 
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Figure 7.16: The two cases that can occur if it's mode is not Ready. Each node in a chain is the parent 
of the node below it. 
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Figure 7.17: The two cases that can occur if node u is expecting an ack on the link to v. 
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Figure 7.11) either there is an abort in transit from r to v, or an ack in transit from v 
to r, or v is also in Abort mode and r is the parent of v. But if v is in in Abort mode, 
we can continue this argument inductively to produce a chain of nodes starting with r 
such that each node is the parent of the next node. We also know from invariant 8 that 
this chain can consist of at most n nodes. At the end of this chain (see Figure 7.17, 
there must be either an abort or an ack. We call a chain that ends with an abort 
packet an abort chain and we call a chain that ends with an ack packet an ack chain 

Recall that c is the constant that reflects the worst-case time for a packet queued 
at node to reach a neighbor. Now, observe that in c units of time any abort chain must 
either increase in size by 1 or be converted into an ack chain. But since the size of an 
abort chain cannot increase beyond n (the number of nodes), within 0{n) time any 
abort chain must have converted into an ack chain. Similarly in c units of time, any 
ack chain must decrease in size by 1. Thus in 0{n) time any ack chain will disappear. 

The upshot is that within 0{n) time, acfc r [-y] will become false. Since this happens 
for any child v of r, within 0{n) time after s;, r will change its mode from Abort to 
Ready and we are done. 

A formal proof can be made based on these arguments. The appendix contains a 
formal statement of the lemma (Lemma D.2.1) but omits a detailed formal proof. 

7.7.2 Why behaviors of the reset automaton are timely, con- 
sistent, and causal 

Recall that we defined a derived variable called status{u) which is on if mode{u) = 
Ready and signalbit{u) = false and off otherwise. From the code, it is easy to see that 
u will only send and receive messages when status{u) = on. 

The first important tool is what we call the Signal Lemma. Consider any execution 
a of 1Z\L and any node u and any state S{ in a. The signal lemma basically says that if 
status{u) = off'm state s;, then a SlGNAL u event occurs within 0{n) time after S{ in a. 
This follows because if signalbit u = true in s;, then a SlGNAL u event must occur within 
constant time of S{ by the timing conditions. On the other hand, if mode{u) ^ Ready 
in S{, then the Termination Lemma tells us that in 0{n) time after s;, we reach a state 
Sj in which mode{u) = Ready. If Sj is the first such state after s;, then by the code, we 
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know that u cannot change its mode to Ready without also setting signalbit u = true. 
Then, as before, a SlGNAL u event must occur within constant time of Sj in a. 

Now we consider an execution a of 7Z\L and sketch why the behavior corresponding 
to a satisfies the timeliness, consistency, and causality properties. 

Proving the Timeliness Property 

Consider first the timeliness property: 

1. Normal Receipt of Messages: We need to show that there is some constant c 
such that every every receive event that occurs at time greater than f3. start -\-c-n 
in any execution a is normal. Also if aj is any normal receive event and a; is the 
send corresponding to aj, then aj occurs within 0{n) time after a;. 

This follows because that any message m in transit from say v to u (i.e, stored 
either in the queue at v, the link from v to u, or the buffer at u) cannot remain 
in transit for more than 0{n) time. If a message m is stored either in the queue 
at v or in the link from v to u, then (by the properties of the link and the fact 
that the queue holds at most 5 packets), the message m will be stored in the 
buffer at u in constant time. Next, we argue that in 0{n) time after a state S{ 
in which a a message m is in the buffer at u, m will either (see Figure 7.18) be 
delivered or be "flushed" by an ABORT packet sent from v. 

If status(v) remains on for a constant amount of time after s;, then the message 
m will be delivered. Thus the only other possibility is that message m remains 
in the buffer because status(v) is off'm constant time after S{. But in that case 
by the Signal Lemma, there will be a SlGNAL u event in 0{n) time after S{. With 
a little work (see appendix) we can show that such SlGNAL u events cannot keep 
occurring at node u for a linear amount of time without causing v to send an 
ABORT packet to u; this will flush the buffer at u. 

2. Periodic Free Events: Consider any i-suffix 7 of execution a. Then in 7 either 
a FREEM U .„ occurs within constant time or a signal event occurs at u within 0{n) 
time or a signal event occurs at v within 0{n) time. 

This follows because of the following observation. Let c be a sufficiently large 
constant time. Suppose either status{u) or status(v) is off in c time after the 
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Figure 7.18: Two Cases for how a message is removed from a link 

start of 7. Then we are done by the Signal Lemma. But if this is not true, 
and c is sufficiently large, then any message m in transit from u to v will be 
delivered in constant time; also any S-Ack message in transit from v to u will be 
delivered in constant time; this will result in freem u [v] = true in constant time. 
In constant time after freem u [v] is true (and assuming that c is large enough so 
that status{u) remains on in this interval), a FREEM U] „ event will occur. 

3. Timely Message Delivery: Suppose aj is a safe SENDM U] „(ra) event in 7. 
Then either a R,ECEIVEM U] „(ra) occurs within constant time after aj or a signal 
event occurs at u within 0{n) time or or a signal event occurs at v within 0{n) 
time. 

This follows because of a similar observation to the one used to show periodic 
free events. Let c be a sufficiently large constant time. Suppose either status{u) 
or status(v) is off in c time after aj. Then we are done by the Signal Lemma. 
If not, by arguments similar to the ones above, we show that message m will be 
accepted and stored at u and then sent to v where it will be delivered in constant 
time. We assume that c is large enough so that status{u) and status(v) remain 
on in this interval. 

4. Signals at a Node induce Signals at Neighbors: There is some constant c 
such that for every SlGNAL u event aj that occurs at time greater than f3.start-\-c-n 
there is a SlGNAL„ event that occurs in linear time before or after aj. 



195 




Node v 



Signal 



TIME 



Figure 7.19: A signal event at u that occurs sufficiently "late" must be preceded in linear time by the 
sending of an Abort packet to v. This in turn causes a signal event to occur within linear time at v. 

First, within linear time of the start of the execution a corresponding to /3, there 
must be some state Sh in which mode = Ready (by the termination lemma). In 
constant time after Sh, there must be some state S{ in which signalbit u = false. 
This follows because signalbit u cannot remain true for constant time without 
a SlGNAL u action occurring, which sets signalbit u = false. Thus any SlGNAL u 
action aj that occurs after state S{ must have been "caused" by u receiving a 
reset request or an ABORT packet while in Ready mode. Let this action be ay; 
by the termination lemma, ay occurs in linear time before aj. But as part of 
action ay, the code will also send an ABORT packet to neighbor v as sketched in 
Figure 7.19. Thus in constant time after ay, this ABORT packet will arrive at v 
and result in a state in which status(v) = off. Thus, by the Signal Lemma, in 
0{n) time after ay, a SlGNAL„ event occurs. Since ay occurs within linear time 
before aj, the SlGNAL„ event occurs within linear time before or after aj. 

Proving the Consistency Property 

Consider now the consistency property. As in the Simple Reset Protocol, the consis- 
tency conditions follow from the sending of ABORT packets between signal intervals. 
We define a signal interval S u at u and a signal interval S v at v to be mates iff a nor- 
mal message sent in S u is received in S v or vice versa. We can show that the mating 
relation is well-defined and symmetrical because of the following two properties. 

The first property which we call send consistency states that messages sent in a 
signal interval at v can be received in at most one signal interval at u; conversely 
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Figure 7.20: What happens between the sending of a message and a later Signal event. 

messages received in a signal interval at u could have been sent in at most one signal 
interval at v. This will help establish that each signal interval at u can have at most 
one mated signal interval at v. 

Send Consistency: Let aj and a*, be any two normal receive events at u from v 
in f3. Let a/ and a m be the send events corresponding to aj and a*, respectively. Then 
there is a SlGNAL„ event between a/ and a m iff there is a SlGNAL u event between aj 
and ah- 

Consider Figure 7.20. This figure shows that if there is a SlGNAL„ event after a/ 
then there must be an ABORT packet sent in between from v to u. On receipt of this 
packet, all packets in the buffer at u will be flushed and status{u) must become off. 
Next, u will not deliver any messages until it has performed a SlGNAL u and status{u) is 
on again. But any messages sent by v after the SlGNAL„ event will arrive (by the FIFO 
property of the queue and link) after the ABORT packet arrives at u. This guarantees 
that such messages will not be delivered until u performs its SlGNAL u event. Similar 
(but more complicated) arguments can be used to show the other side of this claim. 

Next, we show a second property which states (in essence) that a signal interval at 
u cannot send messages to and receive messages from different signal intervals at v. 
This will help show that the mating relation is symmetric. 

Send-Receive Consistency Let aj be a normal receive event at u from v and 
let a m be a normal receive event at v from u. Let a/ and a*, be the send events 
corresponding to aj and a m respectively. Then there is a SlGNAL„ event between a/ 
and a m iff there is a SlGNAL u event between aj and a*.. 
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Consider Figure 7.20. This figure shows that if there is a SlGNAL„ event after a/ 
then there must be an ABORT packet sent from v to u before v performs its SlGNAL„ 
event. When this packet arrives at v there are two possibilities. Suppose the mode of u 
is not equal to Ready at this point. If u has sent any message m to u in the past, then 
u must have sent an ABORT packet to v after message m. On the other hand, if the 
mode of u is Ready when the ABORT packet from v arrives, then u will immediately 
send an ABORT packet to u. In either case, the ABORT packet from u to v will arrive 
at v before the ACK (see Figure 7.20 which shows the second case) from u to v. But 
when the ABORT packet arrives at v, it will flush out any messages in transit from u 
to v] it is only later that the SlGNAL„ event can be performed. 

This effectively means that any messages sent by u after the ABORT is received at 
u can only be delivered at v after the SlGNAL„ event. Similar arguments can be used 
to complete the proof of this claim. 

Now we show the third property of a consistent behavior listed in Definition 7.3.5. 

Successful Sending of Messages: Between any safe SENDM„ ]U (ra) event and a 
later FREEM„ ]U event, there is either a RECEIVEM„ ]U (ra) event or a SlGNAL„ event. 

Call the first SENDM„ ]U (ra) event a;. If in state s;, status(v) = off, message m will 
be dropped; but in this case by the Signal Lemma a SlGNAL„ will occur after S{. But 
in the period between S{ and the SlGNAL„ event, since status(v) = off, the code ensures 
that no FREEM„ ]U event can occur. So suppose m is placed in queue v [u] in S{. Now 
we return to Figure 7.18. We know that either (case 1) m is delivered to u or m is 
destroyed by a later ABORT packet. In either case, we know from predicate 7i, that 
freem v [u] will be false until m is no longer in transit and so no FREEM„ ]U can occur in 
the interim. In Case 2, after the ABORT is sent by v, status(v) will remain off until a 
SlGNAL„ occurs. Thus in this period as well no FREEM„ ]U event can occur. 

Next, we show the fourth property of a consistent behavior listed in Definition 
7.3.5. 

Mating of Final Signal Intervals: Let aj be a normal receive event at u from 
v in a and a/ be the send corresponding to aj. Then there is no SlGNAL u event after 
aj iff there there is no SlGNAL„ event after a/. 

This follows from Figure 7.20. If there is a SlGNAL„ after a/ there must be an 
ABORT packet that will be received at u after aj. This will result in status{u) becoming 
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off after dj. The Signal Lemma now tells us that there will be a SlGNAL u event after 
dj. The reverse argument is similar. 

Finally, the fifth property of a consistent behavior (i.e., the mating relation pre- 
serves temporal ordering) follows in essence from the fact that the UDLs are FIFO 
links and the fact that ABORT packets sent between signal intervals flush the links and 
buffers of previously sent messages. 

Proving the Causality Property 

To show that the behavior corresponding to execution a is causal, we prove the two 
properties of a causal behavior: 

1. There is some constant c such that every signal event a*, in a that occurs at time 
greater than f3. start + c -n is preceded by a request event aj that occurs in linear 
time before a*.. 

It is sufficient to show that there is some constant c such that the following is 
true: if we consider any interval [sj,Sfc] in which no reset request occurs and 
such that Sk-time — Si.time < en, then a*, cannot be a signal event. The required 
property follows from this because it implies that if a*, is a signal event, then a 
reset request must have occurred in linear time before a*.. 

If we choose c large enough, then we can break the interval [sj,Sfc] into four 
subintervals [sj,sj»], [s;*, Sj], [sj,s'-] and [sj»,sj.] such that: 

• Every node u has had mode{u) = Ready in the interval [sj,sj»]. (More 
precisely, for every node u there is some state si in the interval [sj,sj»] such 
that si.mode[u) = Ready.) 

• Every node u has had mode{u) = Ready in the interval [sj»,sj]. 

• Every node u has had mode{u) = Ready in the interval [sj,sj»]. 

• For any u, any signal event enabled in Sj< is guaranteed to occur in the 
interval [sji,Sk-i]. 

Intuitively, we choose the first three subintervals to be long enough, so that every 
node will have had a chance to go to Ready once in each subinterval. The termi- 
nation lemma tells us that this can be done using subintervals whose duration 
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is 0{n). Finally, the fourth subinterval needs to be sufficiently long so that any 
signal events enabled at the start will occur before the end of the subinterval. 
By the timing conditions (see code), this can be done using a subinterval whose 
duration is a constant. 

First, note that a node u can become a root (i.e. mode u = Abort and parent u = nil 
only through two events: first, by a reset request at node u; and second by 
receiving an (ABORT, n') packet with the distance variable at the maximum 
value. We call the second transition, a spurious reset request. We first claim 
that a spurious reset cannot occur in any state in which the following bounded 
distance predicate holds: for all nodes u, if parent u = nil then dist u = 0. This 
follows because if there is an (ABORT, n') in transit from say v to u we can apply 
A and B repeatedly to obtain a chain of nodes ending with a root node r. If 
dist r = then by 8 and T we can arrive at a contradiction in that any ABORT 
packet must carry a distance less than n — 1. 

Next, we claim that all spurious reset requests disappear after the first subin- 
terval. This is because the bounded distance predicate must hold after the first 
interval. If any node u becomes a root after the first interval, since it was Ready 
at some time during the first interval, it must have become a root through either 
a reset request or a spurious reset request; but either of these transitions will 
also set dist u = 0. 

Next, we claim that for every node u, at the end of the third subinterval that 
mode{u) = Ready. This follows because of the following intuitive argument. 

Recall that a root of an abort tree is a node u with parent u = nil and mode{u) = 
Abort. Second, we claim that any roots of abort trees that existed at the start of 
the second subinterval can no longer exist by the end of the second subinterval. 
This is because (by choice of subinterval) any such root node must have changed 
its mode to Ready by the end of the second subinterval. But since no reset 
requests occur in the entire interval, and no spurious requests occur after the 
first subinterval, no new roots can be created after the first subinterval. Thus 
there are no roots by the end of the second subinterval. 

Now consider any node u. But if there are no roots at the end of the second 
subinterval, u cannot enter Abort mode in the third subinterval. This is because 
in any state s in which s.mode{u) = ABORT there must be a root node. (This in 
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turn follows by applying predicate B repeatedly.) But we know that u changed 
its mode to Ready somewehere in the third subinterval. Also, u cannot change 
its mode from Ready to Converge or Abort without first going into Abort mode. 
Thus, u must be in Ready mode by the end of the third subinterval. 

Finally, if no real or spurious reset requests occur in the fourth subinterval and 
every node is ready by the end of the third subinterval, then no signal event can 
occur at the end of the fourth subinterval. This is because we choose the fourth 
subinterval such that any signal actions enabled at the start of this subinterval 
would occur before the end of this subinterval. 

2. A SlGNAL u event occurs within c • n time after any Request u event in a. 

This follows easily because immediately after the Request u event, status{u) = 
off. The claim now follows from the Signal Lemma. 

Thus, as we show more formally in Theorem D.2.28, every behavior of 1Z\L is in 
RP. This can be used to show: 

Theorem 7.7.1 7Z + stabilizes to the behaviors in problem RP in constant time. 

Proof: Follows directly from Theorem 7.6.8 and Theorem D.2.28. | 

7.8 Local Correctability and Dynamic Network Pro- 
tocols 

A dynamic network is a network in which faults are limited to link failures: links 
can crash and recover in arbitrary fashion. A dynamic protocol is a protocol that 
works correctly in dynamic networks. If we assume that the topology (and the list 
of neighbors of each node) eventually stabilizes, then any stabilizing protocol P will 
eventually work correctly in a dynamic network. This is because any finite sequence 
of link failures can only leave P in some arbitrary state. 

A large number of protocols have been designed for dynamic networks. Dynamic 
protocols are useful because the most common faults in real networks are link and 
node crashes. Many of these existing protocols have not explicitly been designed to be 
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stabilizing. However, in this section, we conjecture that a number of dynamic protocols 
can be made locally correctable. 

We start with the reset protocol described in [AAG87] on which the reset protocol 
described in this chapter is based. This protocol was originally designed for dynamic 
networks. Thus besides the actions described in this chapter, the protocol in [AAG87] 
had actions for link failure and recovery. Thus for every node u and every neighbor 
v of u, the protocol had an input action LlNK_UP U] „ (corresponding to the link to v 
coming up at node u) and an input action LlNK_DoWN U] „ (corresponding to the link 
to v coming down at node u). 

Next, consider the reset function (Figure 7.15) used in this chapter for the reset 
protocol. The reset function applied to a node v with respect to a neighbor u is exactly: 

• The code performed in [AAG87] for a LlNK_DoWN U] „ event, immediately followed 

by 

• The code performed in [AAG87] for a LlNK_UP U] „ event. 

In other words, we can obtain a local reset function by simulating a link failure 
immediately followed by a link recovery. Is this a coincidence? 

We present a rough (but incomplete) argument as to why this might work. First, 
consider the stability condition for local correct ability. Consider any neighbor w of u. 
Suppose that the (u,w) subsystem is in a good state - i.e., in a state that belongs to 
L u ,w Now when the link to v comes up or goes down, the original protocol had to 
preserve the stability of L UiW . Thus the code for the LlNK_UP U] „ and LlNK_DoWN U] „ 
events preserves the stability of L u . w . 

Now consider the correction condition. Clearly it is possible in the dynamic pro- 
tocol to have the link at u go down simultaneously at both ends and then come up 
simultaneously at both ends. When the link comes up it should come up with no 
messages in the links and with both u and v in states that satisfy L u>v . 

Both arguments given above are incomplete. For example, consider our "proof" 
of the stability condition. We claimed that if the (u,w) subsystem was in a good 
state, the LlNK_DoWN U] „ events would preserve the stability of L u . w . To make a more 
careful argument we have to add the following local extensibility condition. For every 
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state a £ L UiW , there is some valid global state s of the network, in which the (u,w) 
subsystem is in state a and the link from node u to node v is considered to be up at 
node u. Since s is a valid state of the original protocol, and the LlNK_DoWN U] „ can 
occur in state s, the result of taking the LlNK_DoWN U] „ action must result in a new 
valid state s' . But since s' is a valid state of the original protocol it must satisfy all 
local predicates, including L UiW . 

It is possible to formalize the intuitive arguments by adding similar local extensi- 
bility conditions, and showing that the protocol in [AAG87] satisfies these conditions. 
We will not do so here. 

7.9 Summary 

The three main ideas in this chapter are as follows: 

First, we have given a new definition of the correctness of a reset protocol in 
terms of its external behaviors. We have seen that the resulting definition is, in 
some sense, a generalization of the synchronization guarantees offered by a Data Link 
protocol. However, while a Data Link protocol synchronizes two nodes, a reset protocol 
synchronizes multiple nodes. Notice that our definition specifies the behavior of the 
reset protocol when reset requests are continuously being made and not just in the 
event that there is a last reset request. The definition we used in our original paper 
([APV91b]) only specified the behaviors in the event that there is a final reset request. 
However, in trying to apply the reset protocol (for instance, in Chapter 8) we soon 
found a need for the present specification. 

Second, we have applied the Local Correction theorem to stabilize a version of 
the reset protocol described in [AAG87]. We had to make some subtle changes to the 
original protocol to make it locally checkable and correctable. 

Third, we have conjectured that many locally checkable protocols that work in 
dynamic networks can be made locally correctable. To obtain a reset function, we 
concatenate the code that the original protocol used for a link down event with the 
code used for a link up event. As another example, Spinelli [Spi88a] describes a virtual 
circuit protocol that works in dynamic networks. This protocol appears to be locally 
checkable and it appears that we can use the link up and link down code to create a 
reset function. Interestingly, Spinelli [Spi88a] makes his protocol stabilizing by sending 
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a message periodically from the source of the virtual circuit to the destination of the 
virtual circuit and back. He also uses a timer whose value is proportional to the 
maximum end-to-end delay in the network. If, as we conjecture, local checking and 
correction is applicable to Spinelli's protocol, the resulting protocol will be simpler 
and faster than the one presented in [Spi88a]. 
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Chapter 8 



Global Correction Theorem 



The previous three chapters have been concerned with local checking and local correc- 
tion. This chapter marks an important transition as we move to local checking and 
global correction. For the next two chapters, we will study the use of the stabilizing 
reset protocol of Chapter 7 for global correction. In this chapter, we will prove a 
Global Correction Theorem. This theorem states that any locally checkable protocol 
can be stabilized in time proportional to the number of network nodes using global 
correction. Thus global correction removes the need for the original protocol to be 
locally correctable but pays a price in terms of stabilization time. In the next chapter, 
we will apply global correction to a simple synchronizer protocol [Awe85]. 

The focus of this chapter is the Global Correction Theorem. The Global Correction 
theorem should be contrasted with the Local Correction Theorem (Theorem 5.4.3) of 
Chapter 5. The extra price paid for Global Correction is a stabilization time that is 
proportional to the number of network nodes instead of the height of some underlying 
partial order. 

A second important idea contained in this chapter is the concept of one-way check- 
ability. One-way checking is a special case of local checking that results in simpler 
and more efficient stabilizing protocols. We illustrate the two ideas in this chapter by 
using a simple spanning tree protocol. The spanning tree protocol is locally checkable 
and hence can be stabilized using the Global Correction Theorem. However, because 
the Spanning Tree protocol is also one-way checkable, we can create a simpler (and 
more efficient) stabilizing spanning tree protocol. 
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This chapter is organized as follows. In the first two sections, we state and prove 
the Global Correction Theorem. In Section 8.3 we describe a locally checkable protocol 
to compute a spanning tree. After all local predicates hold, this protocol computes a 
spanning tree in time proportional to the diameter of the network. Because it does not 
appear that the spanning tree protocol is locally correctable, the methods of Chapter 
5 do not seem applicable. However, the Global Correction Theorem applies to this 
spanning tree protocol. 

Next, in Section 8.4 we explain the concept of one-way checkability, and show 
that the spanning tree protocol is one-way checkable. We combine this observation 
with with the Global Correction Theorem to yield a simple stabilizing spanning tree 
protocol. Finally, in Section 8.6 we quickly sketch how Global Correction can also be 
applied to the design of a stabilizing protocol for topology maintenance in dynamic 
networks. 



8.1 Statement of Global Correction Theorem 

In this section and the next, we show that any locally checkable protocol M can be 
efficiently stabilized using a reset protocol. The basic idea in the proof of the Global 
Correction theorem is extremely similar to the transformation used in Chapter 5 (see 
Figure 5.6 and Figure 5.7) for the proof of the Local Correction theorem. As before, 
we augment the original automaton with actions to perform local snapshots on every 
(u,v) subsystem. However, there are two crucial differences in the proof of Global 
Correction: 

• First, we replace all links (UDL's) in the original protocol by a stabilizing reset 
protocol for graph G as described in Chapter 7. In other words, each node u 
now communicates with its neighbors using the interfaces provided by the reset 
subsystem. Now, the reset protocol offers an interface that is identical to a UDL 
except that packets are replaced by messages. Thus we also have to replace the 
actions that send and receive packets in the node automata with actions to send 
and receive messages. 

• Second, when a violation is detected in the (u, v) subsystem, a global reset request 
is made using the Request u action offered by the reset protocol interface. Then 
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when a SlGNAL u action occurs, node u will essentially do a "local restart" by 
initializing its state. This will guarantee that no more violations of the (u,v) 
subsystem will be detected after both u and v have initialized their state. 

Suppose we are given a locally checkable automaton M . Then in the transformation 
outlined above, we need to replace the events to send and receive packets by events to 
send and receive messages. Thus we cannot guarantee that the transformed automaton 
stabilizes to the behaviors of M\L but to a renamed version of M\L in which action 
names have been suitably renamed. This motivates: 

Definition 8.1.1 Consider a network automaton N '. We define R(Af), the message- 
renamed version of M , to be the automaton that is identical to M except that for all 



• Every SEND U] „(*) (i.e., packet send) event in M is renamed as a SENDM U] „(*) 
(i.e., message send) event. 

• Every R,ECEIVE U] „(*) (i.e., packet receive) event in M is renamed as a R,ECEIVEM U] „(*) 
(i.e., message receive) event. 

• Every FREE U] „ (i.e., packet free notification) event in M is renamed as a FREEM U] „ 
(i.e., message free notification) event. 

The message-renamed version of a set of behaviors of a network automaton is defined 
similarly. 

Note that renaming does not really affect the services offered by an automaton 
because the users of the automaton can change their interface to accommodate the 
renaming. 

With this definition, we can state the main theorem of this chapter. It states 
that any locally checkable network automaton M can be transformed into another 
automaton A/" + such that A/" + stabilizes to a version of M in which i) All local predicates 
hold ii) The node and link delays are increased by a constant factor iii) The packet 
events are renamed as message events. 
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Theorem 8.1.2 Global Correction: Consider any network automaton J\f = Net(G,N) 
that is locally checkable for some predicate L using link predicate set C Then there 
exists some M + that is a message-renamed UIOA for graph G and a constant c such 
that M + stabilizes to the behaviors of R(Af(c)\L) in 0{n) time. 

8.2 Proof of the Global Correction Theorem 

We present the main ideas in the proof of the Global Correction Theorem in the 
following subsections. We first sketch the construction of the augmented automaton 
A/" + . Then, we show that after linear time of the start of any execution of A/" + all reset 
requests disappear. We use this to conclude that A/" + stabilizes to a message-renamed 
version of M in linear time. Finally, we make some observations about modularity in 
the proof. 

8.2.1 Construction of Af + 

Let L u>v be the local predicate for each leader edge (u,v). We construct A/" + as follows. 
As in Chapter 5, for every leader edge (u,v) we add actions to nodes u and v to perform 
the local snapshot protocol on the (u,v) subsystem. The local snapshot protocol is 
used to detect a violation of predicate L u>v . When the leader u detects a violation, u 
makes a reset request. The code is derived from the code used for the proof of the Local 
Correction theorem in Chapter 5 (Figures 5.6, 5.7, and 5.5) by making the following 
changes: 

• All sending and receiving of packets in the modified node automata N+ is re- 
placed by sending and receiving of messages so that N+ can be a user of the 
reset subsystem. Thus instead of sending request and response packets, we send 
request and response messages. Similarly we have to replace the free packet 
notification events with free message notification events. 

• There is no longer a need for a mode variable in N+ because each phase is 
implicitly a snapshot phase. Whenever the mode is checked in the original code, 
we only follow the code path corresponding to mode = snapshot. 

• We add a requestbit u variable to N+ to remember to do a reset request. 
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• When a response is received at the leader and a local predicate violation is 
detected, requestbit u is set to true (instead of changing mode u [v] to reset). 

• We compose the modified node automata N+ with the reset protocol 7Z\L of 
Chapter 7. Thus in the final automaton, N+ has input action SlGNAL u and 
output action REQUEST U . 

• N+ makes a reset request Request u whenever requestbit u is true; on taking this 
action requestbit u is set to false. 

• On receiving a SlGNAL u event, N+ does a local restart. It first resets the basic 
state of the original automaton N u to some initial value I u ( we will discuss how 
I u is determined below). Also for each neighbor v of u, u will locally reset all 
the phase and free variables. More precisely, the free u [v], freeq u [v] and phase u [v] 
variables are all set to false for all neighbors v. The counter variable count u [v] 
is initialized to 1 (if u is the leader) and to (if u is not the leader). This will 
ensure that after a signal event at u and v, all (u,v) phases are clean and all 
message send events at u are safe. 

• The definition of local checkability in Chapter 5 ensures that there is always at 
least one global state s which satisfies all local predicates. Now if there are any 
messages in transit in global state s, we consider the state s' that results when all 
these messages are received by their destinations. Because the local predicates 
are all closed predicates, s' satisfies all local predicates as well. But in addition 
s' has no messages in transit. Then we choose the initial state I u for each node 
u to be equal to s'\u. 

The modified code is described in Figure 8.1, Figure 8.2, and Figure 8.3. The 
message formats and timing partitions are as in the proof of the Local Correction 
Theorem. As before, we hide all actions of A/" + that are not actions of N '. 

8.2.2 All reset triggers disappear in Linear Time 

In the proof we will make heavy use of the reset protocol specification given in Chapter 
7 and some aspects of the proof of the Local Correction Theorem in Chapter 5. Refer 
to Chapter 7 for definitions of normal messages, the send corresponding to a receive, 
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Sendm U] „(p) (*output action for p 


t ±data 


only*) 










Preconditions: 














freeq u [v] := true and free u [v] : 


= true 












p is head of queue u [v] 














((l(u,v) = u) and (phase u [v] = 


= false)) 


OR {{l{u 


v) = 


v) and turriu [v] = 


data)) 




Effect: 














freeq u [v] := false and free u [v] 


= false; 












Remove p from head of queue 


-M 












turriu [v] = response 








(* give response 


packets 


a turn*) 


phase u [v] = true 




(* 


start 


a new checking/correction 


phase*) 


Fr.eem U] „ (*input action*) 














Effect: freeq u [v] := true and free 


u [v]:= true; 











Figure 8.1: Code for the modified SEND mj „(p) actions at a modified node 7\T+ in order to do Global 
Correction. 
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SENDM u ,„(p; eq ) 
Precondition: 

l(u,v) = u 

(phase u [v] = true) or ( queue u [v] is empty) 

freeq u [v] = true 

p' req . count = countu [v] ; 
Effect: 

freeq u [v] = false 



(*output action: u repeatedly sends a request till it gets a response*) 



phase u [v] := true; 



(*u is the leader of link subsystem*) 

(*phase in progress or no data waiting*) 

(* no message in transit on link to v *) 

(* counter correct*) 

(* set to false until link says it is free*) 
(*remains true until matching response returns*) 



R,ECElVEM„ ]U (p' ) (*input action, receive request at u from v*) 

Effect: 

If p' req . count ^ countu[v] and l(u,v) = v then (*not a duplicate or invalid message*) 

countu [v] := p' r .count; (*remember count*) 



Figure 8.2: Code to send and receive request messages at node u in order to do Global Correction. 
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SENDM U] „(p; esp ) 


(*output action 


u repeatedly sends a response to last request*) 


Preconditions: 






l(u,v) = V 




(*u is not the leader of link subsystem*) 


(turriulv] = response) or ( queue u [v] is empty) ^response's turn or no data messages waitii 


freeq u [v] = true 




(* no message in transit on link to v *) 


p'resp- count = count u [v] ; 






p' resp .nodestate = s\u 






Effect: 






turnu [v] := data 




(*give data messages a turn*) 


freeq u [v] := false 




(* set to false until link says its free*) 


RECEIVEM„ ]U (p; esp ) 




(*input action to receive response at u from v*) 


Effect: 






If (count u [v] = p' resp . count) and (phase u 


v] := true) and (l(u, v) = u) then 


If (s\u, nil, nil, p' r . node state) ^ L UjV 


then requestbit u := true 


phase u [v] := false; 




(*end of phase*) 


counl^, [ v ] '■= {country] + 1) mod 4; 




Request u 




(*input action to make a reset request*) 


Preconditions: 






requestbit u := true; 






Effect: 






requestbit u := false; 






SlGNAL u 




(*input action to receive a Signal at u*) 


Effect: 






For all neighbors v 




(*make links to all neighbors clean*) 


phase u [v] := false; 






If l(u,v) = u then country] := 1 else 


countu [ v ] '■= 


Empty queue u [v] and set freeq u [v] and /ree^u] to false 


s\u = I u 




(initialize node state*) 



Figure 8.3: Code to send and receive response messages and to make reset requests and respond to 
signals at node u. 
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causality, timeliness, and the mating relation. Refer to Chapter 5 for definitions of a 
phase and a clean link. 

The proof consists of showing that all reset requests will eventually stop and that 
in linear time after this L holds. Define a (u,v) reset trigger to be the receipt of 
a snapshot response from v at u that causes u to set requestbit u = true. The main 
part of the proof consists of showing that all reset triggers disappear in linear time 
(which by causality implies that all signals disappear). We start with some preliminary 
definitions. 

Consider an execution a of A/" + . Define a quiescent state in a to be a state S{ in a 
such that: 

• All messages received after state S{ in a are normal. 

• All signal events that occur after state S{ are preceded by a request event that 
occurs in linear time before the signal event. 

• For any pair of neighbors u and v, a signal event aj at u that occurs after state S{ 
is accompanied by a signal event at node v that occurs within linear time before 
or after aj. 

We have the following lemma: 

Lemma 8.2.1 Quiescent Lemma: A quiescent state occurs within linear time of 
any execution of M + . 

Proof: A/" -1- is the composition of the augmented node automata with 1Z\L. We know 
the behaviors of 1Z\L are timely and causal. The lemma follows from the first timeliness 
property, the first causality property, and the fourth timeliness property. | 

Suppose that u is the leader of some (u,v) subsystem. We show that in any 
execution a of A/" + , there exists some linear time t such that all (u,v) reset triggers 
disappear in time t after a quiescent state in a. Since a quiescent state occurs in linear 
time after the start of a, it follows that all reset triggers disappear in linear time after 
the start of a. 

The proof is in two parts. First, we show that one of two cases must occur in linear 
time after a quiescent state. Next, we show that no (u,v) reset triggers can occur after 
the occurrence of either of the two cases. 
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One of Two Cases must Occur in Linear Time after a Quiescent State 

To define the two cases, we need two simple preliminary definitions. Recall from 
Chapter 7 the definition of a signal interval and the definition of a send corresponding 
to a normal message. Intuitively, an initialized signal interval at a node is a signal 
interval that begins with a signal event at the node; we give it this name because each 
node initializes its local state immediately after a signal event. Next, a regular message 
receipt is a message that was received in an initialized signal interval and was sent in 
an initialized signal interval. Formally: 

Definition 8.2.2 An initialized signal interval at node u is a signal interval at node 
u that begins with a SlGNAL u event. 

Definition 8.2.3 A message receive event is regular if 

• The message receive event is normal 

• The message is received in an initialized signal interval at the receiving node. 

• The send corresponding to the receive event occurs in an initialized signal interval 
at the sender. 

Next, we state the main lemma of this subsection. 



Lemma 8.2.4 Quiescent Cases: Consider any execution a of M + and any leader 
edge (u,v). Within linear time after any quiescent state S{ of a there is a state Sj such 
that one of the following two cases is true: 

• Case 1: Any message received by u from v (and vice versa) after Sj is regular 
OR 

• Case 2: In state Sj, L u>v holds and (u,v) is clean. Also, the interval [si,Sj] is 
contained in some signal interval at u as well as some signal interval at v, and 
these two signal intervals are mates. 
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Proof: (Sketch) Recall the definition of a (u,v) phase from Chapter 5. Roughly 
speaking, a (u,v) phase is an interval that begins with the leader u sending a request 
and ends with u receiving matching response. Recall also the definition of a clean phase 
from Chapter 5. The proof also makes considerable use of the timeliness, consistency, 
and causality properties (see Definitions 7.3.3, 7.3.5, and 7.3.6) of the reset automaton 

n\L. 

We choose state Sj as some state that occurs after state S{ and satisfies either one 
of two properties: 

a) Property 1: Six (u,v) phases occur in the interval [sj,Sj] AND there are no 
signal events at u or v in [sj,Sj] AND L u>v holds in state Sj. 

OR 

b) Property 2: There is some state Sk in the interval [sj,Sj] such that at least 
one signal event occurs at both u and v before Sk- Also, for any message receive event 
that occurs after Sj, the corresponding send event occurs after Sk (i.e., all messages in 
transit in state Sk are delivered before state Sj). 

We now show that we can find such a state Sj that occurs within linear time after 

Si. 

From the second and third timeliness conditions, we see that message delivery 
between u and v behaves exactly like a UDL in the absence of signal events at u or jj. 
In other words, the reset subsystem delivers messages and free notifications in constant 
time in the absence of signal events. Thus from the proof of the Phase Rate lemma in 
Chapter 5 (Lemma 5.6.5), we see that in the absence of signal events at either u or v, 
a (u,v) phase will complete in constant time. Similarly, in the absence of signal events 
at either u or v, six (u,v) phases will complete in constant time. By Lemma 5.6.12, 
the sixth (u,v) phase is a clean phase - i.e., a phase in which the matching response is 
sent by v after the receipt of the request. Thus the sixth phase will produce a correct 
snapshot of the (u,v) subsystem. Thus by the end of the sixth phase, either L UiV holds 
or node u will make a reset request. But if node u makes a a reset request, then by 
the second causality property, a signal event will occur at u in linear time after S{. 

We conclude from the last paragraph that in linear time after S{ either we reach a 
state Sj satisfying Property 1 or a signal event occurs at either u or v. But if a signal 
event occurs at either u or v in linear time after s;, then we know (from our choice of 
Si and the fourth timeliness property), that a signal event occurs at both u and v by 
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some state Sk that occurs within linear time after S{. But in the latter case, the first 
timeliness property tells us that there is some state Sj that occurs within linear time 
of Sk and such that all messages "in transit" in state Sk have been delivered before 
state Sj. Thus in linear time after S{ we reach a state Sj satisfying either Property 1 
or Property 2. 

We are now ready to prove the lemma. If Sj satisfies Property 2 we show that Case 

1 must be true; and if Sj satisfies Property 1 we show that Case 2 must be true. 

If Sj satisfies Property 2 then we know that all messages received at u from v (and 
vice versa) after state Sj was sent after some state Sk- We also know that at least 
one signal event has occurred at both u and v before state Sk- Thus all such message 
receive events are regular, and we have Case 1 in the lemma statement. 

If Sj satisfies Property 1 then we know by Lemma 5.6.12 that (u,v) is clean in 
state Sj. We also know that L u>v holds in Sj and that no signals occur at either u 
or v in [sj,sj]. Thus there must be some signal interval (say S u ) at u that contains 
the interval [sj,sj]. Similarly, there must be some signal interval (say S v ) at v that 
contains the interval [sj,sj]. Now the sixth (u,v) phase in [sj,Sj] is a clean phase. But 
in a clean phase, v sends a response to u after v receives a request from u. Thus there 
is at least one message sent after S{ by u that was received before Sj at v. Hence, from 
the consistency property, the intervals S u and S v must be mates. Thus we have Case 

2 in the lemma statement. | 



All (u,v) triggers disappear after either Case 1 or Case 2 

Next we show that for any leader edge (u,v), all (u,v) reset triggers stop after one of 
these two cases listed above occur. Since we have just shown that that one of these 
two cases must occur in linear time, we conclude that all reset triggers disappear in 
linear time. 

We first show all (u,v) triggers disappear after Case 1 occurs: 

Lemma 8.2.5 Case 1 Trigger Termination: Consider any execution a of J\f + and 
any leader edge (u,v). Suppose that after any quiescent state S{ of a there is a state Sj 
such that any message received by u from v (and vice versa) after Sj is regular. Then 
no (u,v) triggers occur after state Sj in a. 
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Figure 8.4: Correct Snapshots after nodes u and v have each performed a signal event. 

Proof: (Sketch) Recall that a (u,v) trigger is the receipt of a response at u which 
causes u to set requestbit u = true. Consider the receipt of any response at u after Sj. 
By construction, this response must be received in an initialized signal interval (say 
S u at u) and must correspond to a response sent in an initialized signal interval (say 
S v at v). This is shown in Figure 8.4. 

We claim that the sequence of messages received by u in S u is a prefix of the 
sequence of messages sent by v in S v , and vice versa. We will refer to this as the prefix 
property. The prefix property follows from the consistency property because: 

• S u and S v are mates, 

• All messages received after S{ are normal and 

• All messages sent in an initialized signal interval are safe. 

We now claim that that the matching response cannot be a (u,v) trigger because of 
the following argument. First, at the start of S u , u sets its basic state to I u and at the 
start of S v , node v sets its basic state to /„. But (I u ,nil,nil,I v ) (E L UiV . Next, at the 
start of S u , count u [v] is initialized to 1; also at the start of S v , count v [u] is initialized to 
0. Thus by the prefix property, the sequence of states and messages sent and received 
in S u and S v could have occurred in some execution 7 of the (u,v) subsystem, such 
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that (u,v) is clean and L u>v holds in the first state of 7. Thus, as in Chapter 5, any 
matching responses will carry accurate snapshot information, and will not result in a 
(u,v) reset trigger. | 

Note that what makes this and the result in Chapter 5 work is that L u>v is a closed 
local predicate — once L u>v holds, it continues to hold, regardless of the behavior of 
other subsystems. Next, we show all (u,v) triggers disappear after Case 2 occurs: 

Lemma 8.2.6 Case 2 Trigger Termination: Consider any execution a of J\f + and 
any leader edge (u,v). Suppose that after any quiescent state S{ of a there is a state 
Sj such that: 

• L u>v holds and (u,v) is clean in Sj. 

• The interval [si,Sj] is contained in some signal interval at u as well as in some 
signal interval at v, and these two signal intervals are mates. 

Then no (u,v) triggers occur after state Sj in a. 

Proof: (Sketch) The argument is extremely similar to that for Case 1 except in the 
signal intervals at both u and v that contain the interval [sj,sj]. This is shown in 
Figure 8.5. Let the signal interval at u that includes state Sj be called S u . Let the 
signal interval at v that includes state Sj at be called S v . We know that S u and S v are 
mates by assumption, and that in Sj, L u>v holds and (u,v) is clean. This is shown in 
Figure 8.5. 

Thus the sequence of states and messages sent and received in S u and S v after state 
Sj could have occurred in some execution 7 of the (u,v) subsystem, such that (u,v) 
is clean and L UiV holds in the first state of 7. But then, any matching responses will 
carry accurate snapshot information, and will not result in a (u,v) reset trigger. Thus 
in the first signal intervals for Case 2 we rely on an explicit state Sj (in which the local 
predicate holds and the link is clean). By contrast, in Case 1 we relied on the first 
signal intervals being initialized signal intervals. 

The argument for signal intervals after S u and S v in Case 2 is identical to the 
argument for Case 1 because such signal intervals are initialized signal intervals. | 
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Figure 8.5: Correct Snapshots after L UjV holds and link (u,v) is clean. 

8.2.3 All Local Predicates Hold and All Signals Stop in Linear 
Time 

From the Quiescent Lemma, the Quiescent Cases Lemma, and the Case 1 and Case 
2 Trigger Termination Lemmas, it follows that all reset triggers disappear in linear 
time after the start of any execution of A/" + . Hence, by causality, all signal events also 
disappear in linear time. More precisely, any execution a of A/" + has some 0(n)-suffix 
7 in which there are no signal events. 

But now it is easy to see that in constant time in 7, all local predicates hold. We 
know from the second and third timeliness conditions (and the fact that there are no 
signals in 7) that six (u,v) phases will complete in constant time after the start of 
7'. But after these six phases are complete, L u>v must hold. If not, u would detect 
a violation in the sixth phase (which is guaranteed to be a clean phase) and u would 
have made a reset request. But this will result in a signal event in 7, a contradiction. 

Lastly any execution of A/" + which contains no signal events and in which L u>v 
holds for all links can be shown to be an execution of Af{c)\L for some constant c. As 
in Chapter 5, the c represents a constant slowdown due to the overhead of periodic 
checking. 
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8.2.4 Proof of Global Correction Theorem is Modular 

We note that in the transformation used to prove the Global Correction Theorem we 
used the reset protocol 1Z\L from Chapter 7. However, the proof only uses properties 
of the behaviors of 1Z\L and thus 1Z\L could have been replaced by any automaton 
that has the same interface as 1Z\L and has the same behaviors. Further, note that 
the modified node automata N+ (that we create by the transformation) are UIOA and 
1Z\L is a CIOA (since every reachable state is also a start state). Thus, the modularity 
theorem assures us that we can replace 1Z\L by its stabilizing implementation (the 
automaton 7Z + as described in Chapter 7) without affecting the Global Correction 
result. 



8.3 A Locally Checkable Spanning Tree protocol 

We begin by describing earlier stabilizing spanning tree protocols and their disadvan- 
tages. Then we describe a locally checkable spanning tree protocol. This protocol will 
be used in Section 8.5 to construct a stabilizing spanning tree protocol 

8.3.1 Previous Work on Spanning Tree Protocols 

The basic idea in virtually all spanning tree algorithms is that nodes report the smallest 
node ID seen so far (and the shortest distance to this smallest ID node) to their 
neighbors. Each node then picks as its parent the neighbor that knows of the smallest 
ID. If more than one neighbor reports the smallest ID, the node picks from among 
these the neighbor that reports the smallest distance. If two neighbors report the 
same distance and root ID, then an arbitrary tie-breaker is used to select the parent. 
A node sets its estimate of the root ID to equal its parent's root ID, and updates its 
distance from the root to be one plus the parent's distance. Because of the way the 
distance from the root is calculated, this approach is basically an adaptation of the 
Bellman-Ford Algorithm. 

However, in a dynamic network (and in a stabilizing setting), this approach en- 
counters an obstacle known as "ghost roots". This phenomena occurs whenever the 
root crashes: its ID, which was the smallest in the system, is still the smallest from 
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the point of view of nodes that do not know of its crash. Even in a static network, 
the same effect can be caused by initial errors that introduce a root ID lower than 
the ID of any network node. This ID can potentially remain forever in the system! 
Even if the nodes maintain a counter to reflect their distance from the alleged root, 
this counter will simply grow unboundedly. Since the ghost root phenomenon is not a 
rare event, several ways to overcome this difficulty have been suggested. 

One solution, known as the "hop counter" approach, is to have some pre-determined 
bound on the diameter of the network at each node, and to discard the old root 
whenever the associated counter reaches some limit (see, e.g.,[AG90]). Unfortunately, 
this pre-determined bound must be quite high, and hence, the stabilization time of such 
counting up schemes is poor in practice [CRKG89a, CRKG89b, Gar89, RF89, Awe90]. 

Another widely used stabilizing Spanning tree protocol is the IEEE 802.1 bridge 
routing protocol which is based on the design in [Per85]. This solution uses an approach 
that we call timer flushing. The basic idea [Per85] is that each node "times out" 
information received from its neighbors unless the information is refreshed periodically. 
Any node that thinks it is the root is responsible for periodically sending updates to 
all its neighbors in order to refresh their state information. Any other node X sends 
estimates to its neighbors only after receiving a message from the node X thinks is its 
parent. The upshot is that if there is an old root in the system, the information about 
this old root will eventually "time out", after which the system will stabilize. 

However, timer flushing suffers from several drawbacks. First, the node clocks (by 
which packet lifetimes are enforced) can have different rates. Second, the message 
delivery time over links typically has large variance; since the topology of the network 
is not known in advance, the variance of the message delivery time across the whole 
network is even higher. A conservative timeout bound must take into account high 
message latencies and the worst-case topology, even though the latencies of an actual 
execution may be considerably (e.g., an order of magnitude) smaller. This leads to an 
order of magnitude slowdown of the stabilization process. Lastly, the parameters of 
such "global" timeout protocols often have complicated dependencies. Tuning these 
parameters is typically quite difficult. 

Th stabilizing spanning tree protocol we will describe in Section 8.5 is extremely 
similar to the two schemes we have described above. However, it uses a different 
mechanism for detecting and recovering from states with ghost roots that speeds up 
the stabilization time of the resulting protocol. 
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The detection mechanism is based on the following observation. Consider an exe- 
cution of the simple spanning tree protocol that starts with a state in which all nodes 
are correctly initialized and there are no messages in transit on the links. Now focus 
on some node. Throughout the execution the node maintains a current estimate of the 
root ID, and another estimate for its distance from this alleged root. It can be shown 
that in the course of legal executions, the node's estimate of the root ID never goes 
up; also while such a root estimate is fixed, the distance estimate never goes up. This 
property can be cast in the form of a local predicate for each link. If the predicate 
holds, then the algorithm will produce a spanning tree. This immediately suggests the 
stabilizing algorithm: whenever the predicate is violated, the node that detects the 
violation makes a reset request. In the execution that follows the last reset signal, all 
the information will be correct. 

Rather than directly describe the final spanning tree protocol of Section 8.5, we 
will derive it in the following way. In the next subsection, we will describe a locally 
checkable (but non-stabilizing) spanning tree protocol on which the final spanning 
tree protocol is based. Of course, the Global Correction Theorem is applicable to 
this protocol. However, we will show in the following section that the spanning tree 
protocol is also one-way checkable. We then combine these observations to produce the 
final stabilizing spanning tree protocol in Section 8.5. The resulting protocol is simpler 
and more efficient than a spanning tree protocol based only on Global Correction. 

8.3.2 Code for Locally Checkable Spanning Tree Protocol 

We now describe the code for our locally checkable (but non-stabilizing!) spanning 
tree protocol. After all local predicates hold, this protocol computes a spanning tree 
in time proportional to the diameter of the network. 

Before we delve into the code of the protocol, we define what it means for a spanning 
tree protocol to be stabilizing. We assume that the network topology is described by 
some topology graph G and that there is an output action R,EPORT u (p) at every node 
u. This action is used to report the parent p of node u in the spanning tree, where 
p is some neighbor of u in the network graph. We say that a spanning tree protocol 
is stabilizing if it stabilizes to the stable tree behaviors for graph G. The stable tree 
behaviors for graph G are the behavior j3 in which for every node u: 
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• For every i-suffix 7 of j3 and for every node u, a REPORT u event occurs in constant 
time, (i.e., nodes report their parent values at constant intervals of time.) 



• 



• 



For every node u, if there is REPORT u (pi) event and a later REPORT„(p 2 ) event 
in j3, then pi = p 2 . (i.e., the parent values reported by nodes never change). 

Consider any set of REPORT*(*) actions in j3 containing exactly one REPORT u (p u ) 
action for every node u. Then the graph induced by the values of the p u variables 
is a spanning tree of G. (i.e., the parent values reported by nodes forms a 
spanning tree of G) 

Our locally checkable spanning tree protocol is described by the automaton T u 
shown in in Figure 8.6. The protocol is based on the simple idea alluded to earlier. 
Each node keeps track of the smallest node ID seen so far and the shortest distance to 
this smallest ID node. Each node uses this information to compute its parent in the 
tree. 

Thus each node u maintains a parent pointer parent u , current estimate of the root's 
identity r u , and a current distance estimate d u . We denote by (r u ,d u ) the ordered pair 
at node u. We will use lexicographic ordering for 2-tuples and 3-tuples. For example, 
(r v ,d v ) < (r u ,d u ) means that either r v < r u , or that r v = r u and d v < d u . Similarly, 
[r v , d v , parent^ < (r u ,d u ,parent u ) means that that either (r v ,d v ) < (r u ,d u ), or that 
(r v ,d v ) = (r u ,d u ) and parent v < parent u . Each node also maintains a copy (possibly 
outdated) of the root and distance estimates of each its neighbors. Thus the estimates 
of neighbor v are stored at u in the variables r u [v] and <Z u [v]. 

For compactness, we let both arrays have an entry for the node itself, in which the 
default values are "hardwired": thus r u [u] = u, and d u [u] = — 1. Now a node always 
chooses its own estimate based on the minimum of the estimates of all its neighbors 
and its own default estimate. Node u chooses its own estimate of the root from this 
minimum estimate; u also chooses its own estimate of the distance as one plus the 
distance from the minimum estimate. Thus if u itself is the smallest root, the distance 
u chooses for itself will be —1 + 1 = 0, which is as it should be. Thus d u [u] is set to 
— 1 simply as a sentinel value to avoid an extra case. 

Every node u periodically sends its own estimate of root and distance to all its 
neighbors using an "Announce" packet. To make the T u a node automaton, we have 
followed the convention (see Chapter 5) of first enqueuing the "Announce" packet on an 
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outbound queue for the link called queue u [v]; then the packet is sent out from the queue 
when the link is free. The Announce packet is encoded as a tuple ( Announce, r,d). 
When a node u receives an Announce packet from v, u first checks whether the estimate 
in the packet is greater than the previous estimate stored from v and the distance in 
the estimate is not already at the maximum possible value. If this is not the case, u 
stores the latest estimate it has received from v; u also updates its own estimate as 
the minimum of all neighbor estimates. 

Let T u be the automaton shown in in Figure 8.6. We assume that for any u,v, all 
actions of the form SEND U] „(*) are in a separate class, and any ENQUEUE U „ action is 
in a separate class. Similarly for any u, all actions of the form R,EPORT u (*) are in a 
separate class. 

Let T be the composition of the automata node T u for every node in G with a 
UDL for every edge (u,v) in G. In the next subsection we show that T is locally 
checkable for a predicate L that we define. In the following subsection, we show that 
T\L stabilizes to the stable tree behaviors in time proportional to the diameter of 
graph G. This sets the stage for applying the global correction theorem to T. 

8.3.3 Local Predicates for T 

We state a set of local predicates for the spanning tree protocol. Let us denote by 
UiV the intersection of the local predicates shown in Figure 8.7 for any edge (u,v). 
Note that UiV consists of two predicates, 01(u,v) and 02(u,v). 01(u,v) says that 
the sentinel values that node u uses (as a neighbor estimate from itself) are correct. 
It also says that node w's estimate is the minimum of all its neighbors. 

02(u,v) shows that the sequence of estimates sent from u to a neighbor v are non- 
increasing. Suppose there is an ( Announce, r,d) packet in transit from u to v, then 
the estimates that v already has stored from u can be no less than the estimates in 
the packet (i.e., (r„ [u] , d v [u] ) > (r, d)). Also the estimates in the packet can be no less 
than the estimates at u (i.e., (r, d) > (r u ,d u )). If there is no (Announce,*,*) packet in 
transit from u to v, then the estimates that v already has stored from u can be no less 
than than the current estimates at u (i.e., (r„ [u] , d v [u] ) > (r u ,d u )). Recall that a Unit 
Capacity Link only allows one outstanding packet on each link, just as in a UDL. 
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State 

neighborSet u set containing all neighbors of u and u itself 

parent u parent pointer, in neighborSet u 

d u : distance estimate, integer in the range 0..n', n' is an upper bound on number of nodes. 

r u : root estimate, all root estimates are in the range of node identifiers 

r u [v]: estimate of r v for each v in neighborSet u except r u [u] = u 

d u [v\. estimate of d v for each v in neighborSet u except d u [u] = — 1 

free u [v] boolean, true if link to v is free for each v in neighborSet u except u itself. 

queue u [v] a queue containing at most one (Announce, *) packet to be sent to v. 

Actions 



Free U] „ (*input action to receive free notification from link to neighbor v*) 

Effects: free u [v] = true 

SEND U] „(p) (*output action to send estimate to neighbor v*) 

Preconditions: p is the head of queue u [v] and free u [v] = true 
Effects: free u [v] = false; Remove p from queue u [v] 

Enqueue u „ (*output action to enqueue estimate to neighbor v*) 

Preconditions: queue u [v] is empty 
Effects: Add (Announce, r u ,d u ) to queue u [v] 

TlECElVE VjU (Announce, r, d) (*input action to receive estimate from neighbor v*) 

Effects: 

If (r, d) < (r u [v], d u [v]) then (*received estimate is no greater than previous estimate*) 
If (d < n') (*received estimate not at max value *) 

(j" u [v], d u [v]) := (r, d) (*update estimate from v*) 

(r u , d u , parent u ) := mm{(r u [v],d u [v] + l,v),v £ neighborSet u } 

Report u (<7) (*output action to report parent estimate*) 

Preconditions: q = parent u 



Figure 8.6: Spanning tree protocol. Code for a node u. 
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Definition 8.3.1 For any edge (u,v), we let L u>v be the intersection of UiV and ViU . 
Let £ be the link predicate set containing L u>v for every (u,v) in G and let L = Conj(C). 

It is easy to show that each L u>v is a closed local predicate. 

Lemma 8.3.2 For a leader edge (u,v) and transition (s,a, s') ofTZ, if s satisfies L u>v , 
then s' satisfies L u>v . 

Proof: First, 01(u,v) is clearly closed because the code never changes the value of 
r u [u] and <Z u [ii]. Also, if the code changes (r u ,d u ,parent u ), it always sets this 3-tuple 
equal to the minimum of (r u [v],<Z u [v],parera^ u [v]) for all v in neighborSet u as required 
by 01(u,v). 

Next, 02(u,v) is closed because of a simple observation: the (r,d) estimates sent 
in ( Announce, r,d) packets by node u are always non- increasing. This follows because 
node u changes its own estimate after receiving an Announce packet from some neigh- 
bor w. But (see code), u will not accept the packet from w unless the new estimate 
from w is no greater than the previous value that u has stored from w and if the 
distance from w is not already at the maximum value n' . But if the new estimate from 
w is no greater than the previous value that u has stored from w, then the new esti- 
mate that u calculates will be no greater than before. But if 02(u,v) is true initially 
and all estimates sent by u are non-increasing, then it is easy to see that 02(u,v) 
remains true. Note that if u does not accept a packet from neighbor v (e.g., because 
the distance estimate in the packet is equal to n') this does not affect 02(v,u) 

Finally, since the above argument can be applied symmetrically for the link (v,u), 
it follows that L u>v is closed. | 

Note that the check in the code that prevents v from accepting an estimate larger 
than a previously stored estimate is really an application of the heuristic of removing 
unexpected packet transitions (see Chapter 6). Clearly if u receives an estimate from 
w larger than the previous estimate stored from w, then 02(w,u) is not true in the 
state before the estimate was received. Thus this is an unexpected packet transition; 
by removing such a transition, we ensure that L u>v is a closed predicate. 

The following theorem is immediate from the last lemma. 
Theorem 8.3.3 The network automaton T can be locally checked for L using C 
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Figure 8.7: Spanning Tree Protocol: Local Predicates for edge (u,v). 

8.3.4 Fast Computation of Spanning Tree after all Link Pred- 
icates Hold 

We now show that T\L stabilizes to the stable tree behaviors in time proportional 
to the diameter of G. Recall that T\L is the spanning tree automaton described in 
Figure 8.6 such that all local predicates hold in the initial state. 

The basic idea is to show first that if all local predicates hold then there can be 
no ghost roots; next, we show that if the spanning tree protocol begins in a state that 
does not contain a ghost root, then the protocol quickly converges to a stable spanning 
tree. 

We assume that each node has a unique ID that is modelled by the name of the 
node in the topology graph. Thus the unique ID of node u is u. Our spanning tree 
protocol depends heavily on the fact that each node has a unique ID that cannot be 
corrupted. Thus we are assuming that the node ID is part of the code at a node. 

We first define what it means to have a ghost root in a state of T . Informally, 
consider the set of all the IDs present (in all the root estimates at nodes and in all 
the Announce packets in the links) in state s. If the minimal ID of all nodes in the 
network is not the the minimal ID in this set, then we have a ghost root. Formally: 
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Definition 8.3.4 Consider a state s of T . We say that s has a ghost root r if r < u 
for all nodes u in G and either: 

• There is an (Announce, r, *) packet in transit from u to v in s for some u,v, OR 

• r u [v] = r in s for some u,v, OR 

• r u = r in s for some u. 

The only way we can have ghost roots in any execution is because of bad data in 
links and nodes when the protocol starts up. 

Lemma 8.3.5 Consider any execution a of T . Suppose there is a state S{ in a such 
that there is no ghost root in S{. Then then there is no ghost root in the state following 
S{ in a. 

Proof: From the code it is easy to see that the only way a new value is added to set 
of existing roots is if a node u adds its own ID to the set (e.g., by changing its root 
estimate to the default). Thus ghost roots cannot be created after state S{. | 

Next, we show that any state in which all local predicates hold cannot contain a 
ghost root. 

Lemma 8.3.6 There is no ghost root in any state ofT\L. 

Proof: Suppose we did have a ghost root in some state s of T\L. Then there must 
be a ghost root r with the lowest ID among ghost roots in s. If it is in an Announce 
packet in transit from say v to u, then we know from 02{y,u) that r v < r and hence 
r v = r. On the other hand, if there is some w such that r w [x] = r or r w = r, then we 
know from 01(w, *) that r w < r and hence r w = r. Thus, we have some node, say w, 
such that r w = r. 

By 01(u,v) if parent w ^ w we can continue inductively following parent pointers. 
Each time we move from say w to x = parent w , we know from 01(w, *) that (r w , d w ) = 
(r w [x],d w [x] + 1)- But since r w = r, (r w [x],d w [x]) < (r, d w ). But we know from 
02(x,w) that that (r x ,d x ) < (^[i],^^]). Thus we conclude that (r x ,d x ) < (r, d w ). 
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But the distance estimates are non-negative and the root estimates in this chain 
are all equal to r since r is the lowest ID ghost root. Hence the process of following 
parent pointers must terminate at some node say z such that parent z = z and r z = r. 
Thus by 01 applied to z, we know that r = z. But that contradicts the fact that r is 
a ghost root. | 

The last important fact to observe is that if there are no ghost roots and all local 
predicates hold, then the protocol converges in time proportional to the diameter of 
the network. As usual, we will say that t = 0(d) to mean t < cd where c is a constant. 
We denote the diameter of G by D. 

Lemma 8.3.7 Consider an execution a ofT\L. Then there is some t-suffix of exe- 
cution a (where t = 0(D)) whose behavior is a stable tree behavior for graph G. 

Proof: We know from from Lemma 8.3.6 that there is no ghost root in the initial 
state of a and hence by Lemma 8.3.5 there is no ghost root in any state of a. From 
the fact that 02(u,v) is closed, we know that 02(u,v) holds for all links (u,v) in all 
states of a. Let q be the node with the smallest ID in graph G. We prove the lemma 
using induction on the distance d of a node u from q. 

Inductive Hypothesis: There exists some constant c such that within c • d time 
after the start of a, there is some state in which (r u ,d u ,parent u ) = (q,d,v), where v is 
some neighbor of u at distance d — 1 from q. Also u will not change r u ,d u or parent u 
from this state onwards in a. 

The inductive hypothesis implies the lemma because it implies that within time 
proportional to the diameter, every node has a parent pointer that points to a node that 
is one hop closer to the root q. It also implies that the parent pointers do not change 
after this point. Also, since for each u, all R,EPORT u (*) actions are in a separate class, 
such a R,EPORT u (*) action must occur in constant time in any suffix of an execution. 

Basis, d = 0: Node q is the only node at distance from itself. In all states S{ 
of a, (r q ,d q ,p q ) = (9,0,9), or there would be a ghost root in state S{ which we have 
already ruled out. 

Inductive Step: Suppose it is true for all nodes at a distance d from s and we 
wish to show that it is true for a node Data distance d-\- 1 from s. Thus there is some 
state Si such that all nodes u within distance d from q have (r u , d u ,parent u ) = (q, d, *). 
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Also, S{ occurs within c • d time after the start of a. We first claim that in all states 
of a, (r v ,d v ) > (q,d + 1)- Suppose this were not true in some state Sk of a. Then by 
following parent pointers as we did in the proof of Lemma 8.3.6 and by using predicates 
01 and 02 iteratively, we can show that we must end in a ghost root. Since we have 
ruled out ghost roots, (r v ,d v ) > (q,d-\- 1) in all states of a. 

Next, v must have some neighbor at a distance d from q. Let u be the neighbor 
with the lowest ID among the neighbors of v at distance d from q. Thus by the 
properties of a UDL, within constant time after state s;, an (Announce, q,d) packet 
from u will arrive at v which will cause (r v ,d v ) = (q,d + 1). (Note that this packet 
will be accepted at v because in the previous state (r„, d v ) > (q, d + 1) as we have just 
shown and d < n' .) Also since u is the lowest ID neighbor at distance d from q, v will 
set parent u = u and this will remain unchanged for the rest of the execution. | 

Theorem 8.3.8 T\L stabilizes to the stable tree behaviors for graph G in 0(D) time, 
where D is the diameter of G. 

Proof: From Lemma 8.3.7 I 

At this stage, we could apply the Global Correction Theorem to obtain a stabilizing 
version of T. However, there is an even simpler transformation because T is one-way 
checkable. Before we present the final version of the stabilizing spanning tree protocol, 
we first define what we mean by a one-way checkable protocol. 

8.4 One Way Predicates and Local Checking 

In Section 8.1, we claimed that every locally checkable protocol could be stabilized 
using the global reset protocol of Chapter 7. The proof sketch suggested that this 
could be achieved by periodically doing a local snapshot of each local subsystem and 
making a reset request if a violation is detected. 

However, in this section we will show that if a locally checkable protocol P is also 
what we call one-way checkable, then we can locally check P using a simpler and faster 
method than doing a local snapshot. In this method, each node periodically sends its 
entire state to each of its neighbors. We can call this local checking by periodic sending 
of state or simply periodic sending. The question that remains is: when is periodic 
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sending adequate to detect local violations in lieu of a local snapshot? The answer is 
that periodic checking is sufficient if each local predicate is what we call a separable 
local predicate. 

Intuitively, a separable local predicate can be separated into two one-way predi- 
cates, one for each direction of a link subsystem. Intuitively a one-way predicate for 
link (u,v) is a predicate that only involves the state of u, the state of the link (u,v), 
and the state of node v. Note that it does not depend on the the state of the link (v,u). 
Formally: 

Definition 8.4.1 Consider any network automaton M with graph G. Let (u,v) be 
any edge in G. We say that UiV is a one-way predicate for edge (u,v) of M if: 

• UiV is a local predicate of M for edge (u,v). 

• For any two states s and s' , suppose s satisfies UiV and s\u = s'\u and s\v = s'\v 
and s\(u,v) = s'\(u,v). Then s' satisfies UiV . 

For example, consider the second predicate of the spanning tree protocol listed in 
8.7. Recall that it essentially says that the estimate values in transit from v to u 
are always non-increasing. Clearly this is a one-way predicate. It is not hard to see 
that the first predicate (which involves only the state of one node) is also a one-way 
predicate. 

Definition 8.4.2 Consider any network automaton M with graph G. Let (u,v) be 
any edge in G. We say that L u>v is a separable local predicate for edge (u,v) of of M 
if there exist two one-way predicates UiV and ViU (one for edge (u,v) and edge (v,u) 
respectively) such that: 

Any state s of M satisfies L u>v iff s satisfies both UiV and ViU 

Once again it is easy to see that the spanning tree protocol has a link predicate 
set that consists of separable predicates. In this case, we say that the spanning tree 
protocol is one-way checkable. 

Definition 8.4.3 A network automaton M is one-way checkable for predicate L using 
link predicate set £ if: 
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• M is locally checkable for predicate L using link predicate set C 

• Each L u>v £ £ is a separable local predicate. 

We now claim the following fact informally. Any one-way checkable automaton can 
be locally checked by periodic sending. We simply add actions to send the state of a 
node periodically to each of its neighbors and actions to receive such packets. 

Suppose that in state s, node u receives such a periodic packet from node v con- 
taining the value x. Then u checks whether (x, nil, nil, s\u) £ 0„, u ; if this is not true u 
detects a local violation. 

Notice that local checking by periodic sending is not just simpler than doing a 
local snapshot but is also faster. The proof of Lemma 5.6.12 in Chapter 5 tells us 
that it takes 6 checking/correction phases for the local snapshot protocol to stabilize. 
Roughly, this means that the local snapshot protocol takes 12 link delays (where a link 
delay is the time to send a control packet from a node to its neighbor) to stabilize. 
By contrast, the periodic sending protocol takes 1 link delay to stabilize. Thus for 
one-way checkable protocols it always pays to use periodic sending for local checking. 

8.5 Simple Stabilizing Spanning Tree Protocol 

We have shown that the spanning tree protocol described in Figure 8.6 is locally check- 
able. Thus we can apply the Global Correction theorem to this protocol. This requires 
checking link predicates using local snapshots. However, since the spanning tree pro- 
tocol described in Figure 8.6 is also one-way checkable, we can can replace the local 
snapshot by periodic sending of state information. The resulting protocol is simpler 
and faster than a spanning tree protocol that uses local snapshots. 

If we look at the protocol in Figure 8.6, we see that each node sends (Announce, r, d) 
packets periodically containing the root and distance estimates at the node. But if we 
look at the predicates in Figure 8.7, 01(u,v) involves only variables at u, and 02(u,v) 
only involves root and distance estimates at v. Thus we can rely on the (Announce, r, d) 
message for periodic checking. This is shown in the code of Figure 8.8 which only 
describes the changes to the code of Figure 8.6 to convert it into a stabilizing protocol. 
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The first change is that we have applied message renaming to the original protocol 
and replaced all packet events with message events. 

The rule for one-way checking is as follows. If in state s, node u receives a periodic 
message from node v containing the value x, then u checks whether (x,nil,nil,s\u) (E 
ViU ] if this is not true u detects a local violation. In our case, the ( Announce, r,d) 
message does not contain the complete state x of node v, but only a projection of the 
state of node v that is sufficient to do checking. But (x,nil,nil,s\u) (E 02(v,u), iff 
(r, d) < (r u [-y], d u [v]). Thus to check whether 02(v,u) holds it is sufficient to check 
whether (r, d) < (r u [v],<Z u [v]) and make a reset request if a violation is detected. Also 
notice that we do not need to check for the 01(v,u) predicate because we have added 
a few lines of code that ensure that 01(v,u) will hold after the first Announce message 
received by v. 

Let T' u be the modified node automaton shown in Figure 8.8. Let 1Z\L be the reset 
automaton for graph G as described in Chapter 7. Let T be the automaton formed 
by composing T' u for each node u with 1Z\L. 

Then we have the following theorem: 

Theorem 8.5.1 T stabilizes to the stable tree behaviors for graph G in 0(n) time. 

Proof: (Sketch) We first show that T stabilizes to the behaviors of R(T(c)\L) in 
0(n) time. The operator R denotes message-renaming, as in the Global Correction 
Theorem. This part of the proof uses arguments similar to the ones used in the proof 
of the Global Correction Theorem except that the arguments are simpler because of 
the use of one-way predicates and one-way checking. 

Next, we know from Theorem 8.3.8 that T\L stabilizes to the stable tree behaviors 
of graph G in 0(D) time. D is the diameter of G and is no greater than n, the 
number of nodes. But if T\L stabilizes to the behaviors in P in 0(D) time, then so 
does R(T(c)\L). This is because the set of stable tree behaviors is invariant under the 
operations of message-renaming and scaling by constant factors. Finally, since behavior 
stabilization is transitive, we conclude that R(T(c)\L) stabilizes to the behaviors in P 
in 0(D +n) = 0(n) time. | 

Finally, we note that because T' u is a UIOA and 1Z\L is a CIO A the modularity 
theorem assures us that we can replace 1Z\L by its stabilizing implementation (7Z + as 
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described in Chapter 7) without affecting the result. This is an elegant application of 
the modularity theorem because it means that we can begin with a stabilizing version of 
a UDL; next, as shown in Chapter 7 we can use the stabilizing UDL implementations to 
construct a stabilizing version of a reset protocol; and finally we can use the stabilizing 
version of the reset protocol to construct a stabilizing spanning tree protocol. In the 
process, we have two applications of the modularity theorem. We can go even further 
and change T so that it offers a suitable interface to the token passing example of 
Chapter 6. If we do so, we could construct a stabilizing token passing protocol using 
the stabilizing spanning tree, once again relying on the modularity theorem. 

In any of these modular constructions, we can replace major pieces of the construc- 
tion by other modules that offer the same external behavior. 

8.6 Stabilizing topology update protocol 

The topology update task is to broadcast the list of incident operating edges at a node 
to all other network nodes. Thus the goal of the topology update task is to produce 
at each node u a database listing the operating edges of each node that is reachable 
from u. 

The following simple strategy is often used for topology maintenance. Each node 
maintains a sequence number. Whenever a change occurs, the incident nodes increment 
their sequence number and broadcast the new status of all their incident links in a 
link state packet ([Per83]). Any link state packet sent by node u contains node w's 
sequence number. Whenever a node v receives a message purporting to be from u, v 
checks whether the sequence number on the message is larger (i.e., newer) than the 
sequence number of the last update v has stored from u. If so, v stores this new update 
from u (after discarding any previous update it has stored from u) and broadcasts the 
update to all neighbors. Otherwise, the message is simply discarded as an outdated 
update. 

Now the sequence number field is finite. Even if we allocate 64 bits for sequence 
numbers, it is always possible (due to errors) for a link state packet with a maximum 
sequence number value to be present in the initial state of the network. 

In the ARPANET [MRR80] and DECNET [Per83] protocols, erroneous updates 
with the maximum number are removed by what we had earlier called "timer flushing" . 
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State 

request bit u [v] boolean, true if a request is to be done later. 
All other variables remain unchanged. 

Modified Actions 



Freem U] „ 

Effects: free u [v] = true 



(*message-renamed free action, replaces Free U] „ action*) 



SENDM U] „(p) (*message-renamed output action to send estimate to neighbor v*) 

Preconditions: p is head of queue u [v] and free u [v] = true 
Effects: free u [v] = false; Remove p from queue u [v] 

TlECElVEM VjU (Announce, r, d) (*input action to receive estimate from neighbor v*) 

Effects: 

If (r, d) < (r u [v], d u [v]) then (*received estimate is no greater than previous estimate*) 
If d < n' (*distance estimate not already at max value*) 

(j" u [v], d u [v]) := (r, d) (*update estimate from v*) 

Else requestbit u = true (*if received estimate is greater make reset request later*) 

(r u [tt],d u [tt]) := (u, —1); (*next two lines make 01(u, v) hold*) 

(r u , d u , parent u ) := mm{(r u [v],d u [v] + l,v),v G neighborSet u } 



Request u 

Preconditions: requestbit u = true 
Effects: requestbit u = false 

SlGNAL u 

Effects: 

For all v ^ u £ neighborSet u do 
(r u [v],d u [v]) := (00,00) 
free u [v] := false 
(r u [u],d u [u\) := (u, -1); 
(r u ,d u ,parent u ) := (u,0,u) 



(*output action to request a reset*) 



(*input action to receive a reset signal*) 



(*reset all estimates*) 



Figure 8.8: Spanning tree protocol. Modifications to the Code for a node u. 
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Each update (and hence the associated counter value) is assumed to remain in the 
network only for some limited "lifetime," after which it is discarded. This prevents 
the problem because after its lifetime expires, an erroneous counter value is no longer 
present in the network. Once again, the disadvantage of timer flushing is that the 
timeout periods have to be high, which results in high stabilization times. 

Instead of using timer flushing, we can use global correction to create a faster 
stabilizing topology update protocol. This is because the protocol is easily seen to 
be locally checkable for the property L which states that the maximum value of any 
sequence number is not present in the network. Thus we can apply the Global Cor- 
rection theorem to ensure that maximum size sequence numbers are removed from the 
network. On receiving a signal, each node deletes the link state packet of all nodes 
other than itself, and resets its own sequence number to 0. A node that does not have 
a link state packet from u will accept any update sent by u as being newer. 

Thus, the number of bits allocated for the sequence number affects only the perfor- 
mance of the algorithm: errors that cause the sequence numbers to reach the maximum 
value incur a performance penalty — a network latency of 0{n). 

A complete design of a stabilizing topology update protocol would also have to 
add a number of other actions as suggested in [Per83]. For instance, each node must 
periodically increment its sequence number and send its latest Link State Packets to 
all its neighbors. Basically, we suggest keeping the essential simplicity of [Per83] and 
only replace the need for global timers in [Per83] with global correction. 

8.7 Summary 

The major point of this chapter is to show that any locally checkable protocol can be 
stabilized using the global reset protocol developed in the last chapter. In general, this 
is done by having the leader of every every link subsystem do a periodic snapshot; if the 
leader detects a local violation, it makes a reset request. However, in the special case 
that the local predicates can be separated into one-way predicates, we can dispense 
with a snapshot and use periodic sending of state. Local Checking by periodic sending 
is simpler and faster than using a local snapshot. 

An important example of both techniques is furnished by the spanning tree pro- 
tocol described in Section 8.3. Both the spanning tree and topology update protocols 
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described are quite practical and could be used in real networks. 

The topology update protocol demonstrates a useful paradigm for designing stabi- 
lizing protocols. The essence of the idea is to design a stabilizing protocol assuming 
the use of unbounded counters. Next, we use global correction to convert this protocol 
into a more practical protocol that uses bounded counters. 

This chapter also demonstrates why the specification of a reset protocol must spec- 
ify the behavior in non-final signal intervals and not just in the final signal intervals. 
The proof of the Global Correction relies strongly on the mating relation between non- 
final intervals. The intuition is that the augmented automaton is doing local checking 
of the system during non-final intervals; if local checking does not eventually observe 
consistent behavior, the augmented automaton may keep making reset requests and 
the protocol may never stabilize. 
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Chapter 9 

Compiling Synchronous Protocols 



In the previous chapters we have shown how we can stabilize certain protocols by 
local checking and/or correction of the state of a node and its neighbors. We applied 
this technique to obtain stabilizing solutions to some important tasks such as mutual 
exclusion, network resets, and computing spanning trees. By contrast, this chapter 
provides a general technique for a special class of problems (known as non- interactive 
tasks), for many of which (e.g., Minimal Spanning Tree, Min Cost Flows, etc.) no 
locally -checkable implementation is known. In fact, for many of the problems solved 
in this chapter, no efficient stabilizing solution was known previously. 

In this chapter, we describe two compilers (Sections 9.5 and 9.6) that convert a 
deterministic synchronous protocol n into a stabilizing, asynchronous version of n. In 
essence, we efficiently transform a solution for the most restrictive model (synchronous, 
fault-free networks) to a solution that works in a very permissive model (asynchronous 
networks with catastrophic faults that stop). 

Let T-n- be the time complexity of n and S w be the space complexity of n. Let n be 
the number of nodes in the network. The first compiler produces a version of n that 
stabilizes in time 0{n + T w ) and has the same space complexity as n. Thus the first 
compiler is extremely efficient if the time complexity of the original protocol n is at 
least 0{n). The second compiler produces a version of n that stabilizes in time 0(T W ) 
and has a space complexity of T w • S^. Thus the second compiler is extremely efficient 
if the time complexity of the original protocol n is very small - i.e., T w = 0(log(n)). 

These two compilers allow us to provide efficient stabilizing solutions for many 
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problems including minimum spanning trees, colorings, maximum flows, and maximal 
independent sets. 

Despite the apparent change of direction, the ideas in this chapter are closely re- 
lated to the ideas in the previous chapters. First, as we have said earlier, the ideas in 
this chapter extend the range of our general techniques. Our previous general tech- 
niques (Local, Tree and Global Correction) only apply to protocols that are locally 
checkable. Second, the compilers in this chapter are implemented using the the tech- 
niques developed earlier. The first compiler is based on a simplified form of local 
checking and correction that we call one-way checking and correction (see Chapter 8). 
The second compiler is based on local checking and global correction. 

When we first presented these results in [AV91], the second compiler, the Resynchronizer, 
was based on a complicated construction. In this chapter, we provide a simplified con- 
struction using the reset protocol described in Chapter 7. We do not have a complete 
proof of the simplified construction, but we will outline why we believe our simpli- 
fied construction is correct. Thus while our confidence in the Resynchronizer result is 
based on the original result in [AV91], we believe that the construction in this chapter 
offers the potential for a much simpler solution. 

This chapter is organized as follows. First, we describe how we model interactive 
protocols and synchronous protocols. Next, we summarize the major results of the 
chapter. Then we contrast the notion of distributed checking (that we use in this 
Chapter) with the independent local checking that we have used in previous chapter. 
Next we briefly describe the first Rollback compiler and then describe the simplified 
version of the Resynchronizer compiler. Finally, we outline extensions for randomized 
protocols. 

9.1 Non-interactive Protocols 

Non-interactive protocols form an important subclass of distributed algorithms. These 
are protocols whose correctness can specified by a relation (the I/O relation) between 
its input and output. For example, if the protocol has to compute a spanning tree 
then the output (the tree) should be a spanning tree of the input (the graph G). By 
contrast, mutual exclusion is an interactive task because (see Chapter 6) its correctness 
must be expressed in terms of sets of valid behaviors or sets of valid executions. 
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We define the notion of a non-interactive protocol more formally using a slight 
variant of the network model introduced in Chapter 5. Recall that in Chapter 5, we 
modelled a network using a topology graph G = (V,E,l), where V is a set of nodes, E 
is a set of directed edges and Z is a leader function. For this chapter we will augment 
the notion of a topology graph to have an extra component I that represents the input. 
We will define an augmented topology graph G = (V,E,l,I) where V \E and I are as 
before. /, however, is a vector of inputs such that for any node u, I u represents the 
local input at node u. (In previous chapters the only input to a node automaton for 
node u was the list of neighbors of u and, in the case of a tree topology, the parent of 
u.) 

We start by fixing an input domain X and an output domain O. 

Definition 9.1.1 An augmented topology graph G = (V,E,l,I) is a topology graph 
(V,E,l) together with an input specifier I, where I is vector of local inputs (I u ) and 

i u ei,VueV. 

Informally, this amounts to modelling the input to the non-interactive protocol as 
part of the local code of each node (which cannot be corrupted) and not as part of 
the state of each node (which can be corrupted). Clearly no non-interactive protocol 
can hope to produce correct output from corrupted input! However, more generally, 
the input to the protocol could be the output of another stabilizing protocol. Thus in 
Chapter 5, we assumed the list of neighbors of each node u was part of the code at 
node u. But in many real networks, the list of neighbors of a node would be provided 
as the output of a stabilizing Data Link protocol. In this chapter, we will often refer 
to an augmented topology graph as an augmented graph. 

Once again, we will model the protocol using node automata N u (one for each node 
u in G), and unit capacity data links C UiV (one for each edge (u,v) in G). 

Definition 9.1.2 An automaton for augmented graph G = (V,E,l,I) is an automaton 
consisting of an 10 A for each uGF together with a UDL C UiV for each (u,v) £ E. 

Refer to Chapter 5 for a definition of the UDL C UiV . Recall that for any state s of 
the automaton we let s\u denote the state of the IOA for node u. For an augmented 
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graph, we also let (X u ) denote a vector consisting of an element X u for each node u 
in the graph. 

For non-interactive protocols we will also assume that there is some local output 
function U of the state of each N u , such that u (s) provides the local output of N u in 
state s. Let M be the network automaton formed by the composition of the node and 
channel automata as described in Chapter 5. Using the O u we can define the output 
function to be a function of the state s of the network automaton M such that O(s) 
produces a vector of local outputs that is identical to the outputs produced by each 
O u when applied to s\u. Thus: 

Definition 9.1.3 Let M be an automaton for augmented graph G = (V,E,l,I). We 
say that O is an output function for M if O is a vector of functions (O u ), such that 
for every state s of M and for every node u £ V, O u (s\u) £ O. We will also abuse 
notation by defining O(s) = (O u (s\u)),\/u £ V . 

Traditionally, the correctness of a non-interactive protocol M is described using an 
input-output relation R. In the traditional definitions, M is allowed to be an ordinary 
IOA (i.e., we are allowed to specify initial states). Next, the correct executions of N 
are those executions a of M in which there is some i-suffix 7 of a such that for all 
states s in 7, (/, 0(s)) £ R. In other words, in all executions of the protocol the output 
must eventually "settle" to a value that satisfies the I/O relation. Thus: 

Definition 9.1.4 Let M be an automaton for augmented graph G = (V,E,l,L). An 
I/O relation for M is a set of tuples (/', 0') where each I' is a vector (/„),/„ £ X and 
each 0' is a vector {0' u ),0' u £ O. 

Finally we define a non-interactive protocol to consist of two components: an 
automaton and an output function. 

Definition 9.1.5 A non-interactive protocol for augmented graph G is a tuple (Af,0) 
where M is an automaton for augmented graph G and is an output function for N ' . 

For stabilization, we will keep the the correctness definition exactly as in the tra- 
ditional definitions except that M will typically be a UIOA. 
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Definition 9.1.6 Let V = (Af,0) be a non- interactive protocol for augmented graph 
G = (V, E, Z, /) and let R be an I/O relation for M . Let C be the set of executions a of 
M such that for all states s in a, (/, O(s)) £ R. We say that non- interactive protocol 
V stabilizes to I/O relation R in time t if M stabilizes to the executions in C in time 
t. 

Thus this definition is a special case of the general definition of "stabilizes to the 
executions of" given in Chapter 3. 

We now formally define two complexity measures that we use to evaluate stabilizing 
non-interactive protocols: the first measures the worst-case time for the protocol to 
stabilize, and the second measures the worst-case amount of space used by any node. 

We define the stabilization time of a non-interactive protocol V with respect to I/O 
relation R to be the infimum of all t such that V stabilizes to R in time t. 

The space complexity of V = (A/", 0) is the worst case across all nodes u of the size 
of the set {s\u : s £ states(Af)} 

9.2 Synchronous Protocols 

We model a deterministic synchronous protocol n as follows. The network topology is 
again specified by an augmented graph G = (V, E, Z, /), where I u once again represents 
the input to node u. The protocol is synchronous because it works in rounds numbered 
from to T w , where T w is some finite, known bound on the running time. The channels 
are no longer UDL's but simply obey the property that any packet sent on a channel at 
the start of a round is delivered by the end of the round. Similarly, the node protocol 
at each node u is no longer modelled by a node automaton but instead by a 4-tuple 
(s°, F u , M u , O u ), where s° is an initial state, F u is a state transition function, M u is a 
message generation function, and O u is the output function. All these correspond to 
the standard notions for such protocols but we explain them briefly below. 

The state s of the synchronous protocol consists of the state s\u of each node in G. 
An execution of the synchronous protocol is generated as follows. At the start of the 
first round, each node u is placed in the initial state s° u and all channels are empty. 
At the start of each round, a node sends a packet to each neighbor. The contents of 
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the packet are determined by the message generation function M u that takes as input 
the state of u at the start of the round. At the end of each round, a node changes its 
state using the state transition function F u that takes as input the messages received 
by u in the round as well as the state at the start of the round. 

The output of 7r in any execution is determined by the value of O u (s\u) for every 
node u, where s\u is the state of node u at the end of round T w . Exactly as for non- 
interactive protocols, we define an output function O to be a function of the state s 
of 7r such that O(s) produces a vector of local outputs that is identical to the outputs 
produced by each O u when applied to s\u. 

Exactly as for non-interactive protocols, we define the correctness of synchronous 
protocols using an I/O relation R. The synchronous protocol n is said to be correct if 
all runs of n end in a state s such that (/, 0(s)) £ R. 

The time complexity of a synchronous protocol n is the number of rounds, T w . The 
space complexity of n is the worst case across all nodes u of the number of states of u. 

Let 7r be a synchronous protocol with time complexity T w and Input- Output relation 
R. We also introduce the notion of a checker for n. A synchronous protocol % is said 
to be a checker for n if it satisfies the following property. 

Suppose each node u in % is given as input both the input of n (i.e., /) as well as 
a value 0' that purports to be the output of n on input I. Informally, % must detect 
(at some node) if the purported output 0' of n could have been produced in some run 
of 7r on input I. Thus % has a boolean output at every node. If any node outputs the 
value false, then it must be that 0' could never have been produced by 7r; if all all 
nodes output the value true, then it must be that 0' could have been produced by it. 

Note that our definition only allows deterministic synchronous protocols whose 
output is a function of its input. Thus for such protocols it, n has a trivial checker that 
simply runs n again and compares the output u (s) at each node with the purported 
output 0' u . The checker outputs true at node u iff u (s) = 0' u . 

9.3 Results 

Recall the definitions of stabilization time and space complexity for a non-interactive 
protocol V and the definitions of time and space complexity for a synchronous protocol 
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Figure 9.1: Comparison of the complexity of our compilers 



7r. Recall that n is the number of nodes in the network. 

Our solutions are in the form of compilers that can compile synchronous protocols 
into stabilizing versions that have the same I/O relation when run on an asynchronous 
network. The performance of our compilers is summarized in Figure 9.1 

Our simplest compiler is the Rollback compiler that takes a synchronous protocol 
7r as input and produces a stabilizing, non-interactive version of n. This is expressed 
as the following theorem: 

Theorem 9.3.1 Rollback Compiler: Consider any synchronous protocol n for aug- 
mented graph G with I/O relation R, time complexity T w and space complexity S^. 
Then there is a corresponding non- interactive protocol V = (A/", O) such that: 

• M is a UIOA. 

• The space complexity of V is 0(T W • S^). 

• The stabilization time of V with respect to R is 0(T W ). 

The main contribution of this chapter is a second compiler called a Resynchronizer. 
In its simplest form, the Resynchronizer takes a deterministic synchronous protocol n 
as input and produces a stabilizing non-interactive version of n. 

Theorem 9.3.2 Resynchronizer Compiler: Consider any synchronous protocol 
■k for augmented graph G with with I/O relation R, time complexity T w and space 
complexity S^. Then there is a corresponding non- interactive protocol V = (Af,0) 
such that: 
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• M is a UIOA. 

• The space complexity of V is 0(5' 7r ). 



• The stabilization time of V with respect to R is 0(T W -\-n). 

Our original proof of the Resynchronizer Compiler Theorem was based on a com- 
plicated construction given in [AV91]. We conjecture that the simplified construction 
in Section 9.6 also provides a proof of the same theorem. 

We will also see in Section 9.7.2 how to use the Resynchronizer to compile random- 
ized, synchronous protocols (of course, this would entail generalizing our model of a 
synchronous protocol to allow coin-tosses at each node.) 

Note that, when compared to the Resynchronizer, the Rollback construction re- 
moves the additive factor of n in the stabilization time but increases the space by a 
factor of T w . Thus Rollback is useful only for "fast" protocols that have T w <C n. The 
two compilers can be used to efficiently stabilize several non-interactive tasks. 

Some sample results obtained by applying the Resynchronizer are as follows. For 
the problems of computing a spanning tree and single source shortest paths we achieve 
0{n) stabilization time and O(log n) space. This is comparable to the results achieved 
in Chapter 7 and the best previous results. For the problem of computing a minimal 
spanning tree [GHS83] we achieve 0{n) stabilization time (which is as good as the 
time of the best non-stabilizing synchronous protocol) using O(logra) space. For the 
problem of computing a maximum flow [Gol85, GT88] we achieve 0(n 2 ) stabilization 
time, which is as good as the time of the best non-stabilizing synchronous protocol. 

The Rollback compiler gives good results when applied to symmetry breaking prob- 
lems such as the problems of computing a Maximum Independent Set [AGLP89], A + l 
Coloring in sparse networks [GPS87], and A 2 Coloring in general networks [Lin87]. For 
instance, for A + l Coloring in sparse networks we achieve log* n for both measures. 
This is much better than any previous results. 

9.4 Distributed Program Checking 

A deterministic sequential algorithm can make itself stabilizing by periodically saving 
its output and running itself again; when it finishes it can check its output. For 
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sequential algorithms, this is ugly and unnatural - after all, we want the computer to 
move on and run other programs! 

However, as we have said before, periodic checking is quite natural for distributed 
computing. Once we have accepted the inevitable periodic cost of stabilization we can 
ask: why not run a checking process to check the algorithm periodically?. If the check 
reveals a problem, we restart the main algorithm. In the previous chapters we have 
already seen how we can do this in some cases if each link subsystem independently 
does local checking. However, that method (which we can call independent checking 
of link subsystems) is limited to locally checkable protocols. 

One approach to check a distributed program introduced by Katz and Perry [KP90] 
is to collect all information at a single "leader" node, thus reducing distributed checking 
to centralized checking. However, the space and time complexities of this method are 
quite large because of the bottleneck at the "leader" node. 

In this chapter, we will introduce a form of distributed checking that is neither 
the independent local checking of the previous chapters nor the centralized checking 
introduced by Katz and Perry. In many cases, we can improve performance by doing 
distributed program checking. 

Such distributed program checking clearly requires coordination of all the network 
nodes. For example, we need to ensure that a node not move to a new phase (whether 
it be checking or executing) before every other node in the network has completed this 
phase. The main difficulty is to implement this coordination in a stabilizing fashion. 
We will describe two such implementations - the Rollback protocol in Section 9.5 and 
the Resynchronizer in Section 9.6. 

9.5 Rollback Compiler 

There is a naive implementation of distributed checking that requires a large amount 
of storage and bandwidth and works only for deterministic protocols. In the naive 
method, every node keeps a log of every state transition it has taken to reach its 
current state. If each node constantly sends its current log to all neighbors, every 
node can check and correct every transition it has made in the past. Since the inputs 
are always correct, eventually all transitions are corrected. This method works only 
because it is possible to check transitions in a stabilizing fashion. 
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Clearly, for an arbitrary asynchronous protocol the logs can grow very large. How- 
ever, if the asynchronous protocol is simulating an underlying synchronous protocol n 
then the size of the log can be reduced to the time complexity T w of n. This idea is 
implemented in the dynamic synchronizer of [AS88]. However, in [AS88] the logs are 
only used to avoid unnecessary recomputation after an input change, and hence are 
not periodically checked. By adding the periodic checking of logs, we obtain a Modified 
Dynamic Synchronizer that we call Rollback. 

The disadvantage of the Rollback method is that it blows up the space utilization 
and periodic bandwidth by a factor of T w . This is not a problem for protocols with 
small time complexity - i.e., those for which T w <C n. Thus the naive method leads 
to efficient solutions for such problems as coloring and MIS. However, for protocols in 
which T-n- > n, Rollback is a poor choice. 

Notice that the Rollback compiler combines the use of logs with a simplified form of 
local checking and correction. The Rollback compiler does local checking by periodic 
sending of state (as opposed to doing a local snapshot). This is because the Rollback 
compiler is one-way checkable (see Chapter 8 for a definition of a one-way checkable 
protocol). Also, the Rollback compiler does local correction by simply correcting the 
local state of a node when periodic sending detects a problem (as opposed to doing a 
local reset). If local correction can be performed in this simple manner, we will call the 
protocol one-way correctable. Chapter 10 discusses one-way correctability in a little 
more detail. 



9.6 Simplified Resynchronizer Compiler 

In the previous section, we saw how Rollback did distributed checking by maintaining 
a log of its computation. Consider a deterministic protocol n. Clearly, we can avoid 
the need for a log if we could simply re-execute n. However, this constant re-execution 
requires coordination among the network nodes which must be implemented in a sta- 
bilizing fashion. In general, the Resynchronizer avoids a log by constantly re-executing 
a checker % for n. We introduce the basic idea by assuming that n is deterministic 
and that n is its own checker. We return to separate checking later. 

The Resynchronizer can be thought of as a stabilizing version of a synchronizer 
[Awe85]. Any synchronous protocol can be simulated asynchronously by using a pulse 
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number at each network node. Let us call a node synchronized if its pulse number 
differs by at most 1 from any of its neighbors. In ordinary synchronizers, every node is 
initially synchronized by setting the pulse numbers of all nodes to 0. Thereafter, a node 
increments its pulse number from p to p + 1 only after all its neighbors have reached 
pulse p, thereby maintaining synchronization. If each node executes the corresponding 
step of the synchronous protocol at pulse p just before incrementing to p + 1, then 
the asynchronous protocol has the same I/O relation as the underlying synchronous 
protocol. 

Since a stabilizing synchronizer cannot rely on correct initialization, we introduce 
a TerminationJDetection phase. This phase will two do things. First, this phase 
ensures that each node has finished executing the synchronous protocol. Second, once 
each node has finished executing the synchronous protocol, a reset (see Chapter 7) 
is initiated. Once all nodes get a signal (see Chapter 7) all nodes reset their pulse 
numbers to and the cycle continues. 

Thus the Resynchronizer can be considered to be an application of global correction 
to the simplest synchronizer protocol described in [Awe85]. Our original construction 
and proof [AV91] relied on a special-purpose reset protocol that was specially crafted 
to work with the synchronizer protocol. In this chapter, we will describe a simplified 
version of the construction that uses the general purpose network reset protocol of 
Chapter 7. While we do not have a complete proof of the simplified construction, 
we believe the construction in this chapter provides the basis for a simpler and more 
elegant solution. In what follows, we only describe the simplified construction. 

When pulse u £ [0,T T ] we say that node u is in the Execute phase. In the Execute 
phase node u simply executes the normal synchronizer algorithm described earlier. It 
also implements the underlying synchronous protocol, starting from initialization at 
pulse followed by writing its output at pulse T w . Let D be an upper bound on the 
diameter of the network. 1 We assume that D < n. 

We denote by Max = T w + D the maximum value of pulse u . When pulse u £ 
[T w + l,Max] we say that node u is in the Termination .Detection phase. Suppose 
that all nodes are synchronized when some node exits the Execute phase. Then the 
Termination ^Detection phase ensures that every node has had a chance to correct its 



1 Unfortunately our protocol needs an upper bound on the diameter of the network. This is a liability 
of the protocol. 
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output before the pulse number of some node wraps around to 0. If the nodes are 
synchronized by the start of this phase, a node need only wait D dummy pulses to 
make sure that all other nodes have reached the TerminationJDetection phase. 

li pulse u reaches Max, node u makes a reset request and waits to get a signal. This 
potentially destroys synchronization. However, we can rely on the properties of the 
reset protocol to deliver a signal at all nodes in a consistent way. When node u receives 
a signal it sets its pulse number to and the cycle continues. Notice that all nodes 
constantly reexecute the underlying synchronous protocol. 

The Resynch compiler must deal with arbitrary pulse numbers and arbitrary mes- 
sages on links in the initial state. Second, it must cope with the fact that the pulse 
numbers are finite and hence have to wrap around. Recall that one of our motivations 
for studying stabilization was to make real network protocols more fault-tolerant. Any 
real counter implementation is bounded. 

In our description and in the code below we only describe the Resynchronizer as a 
tool that can be used to compile a deterministic protocol by re-executing it. It is easy 
to extend these ideas slightly to add a separate checking phase to the Resynchronizer. 

9.6.1 Resynchronizer Code 

We described how to reduce the problem of stabilizing the output of a synchronous 
protocol 7r to the problem of building a stabilizing synchronizer that constantly re- 
executes pulse numbers in the execute region. Thus when presenting the code, for 
simplicity we ignore the details of executing 7r; instead we concentrate on the major 
task of synchronizing pulse numbers. To actually execute 7r, we need to supplement 
the code as follows: 

• Additional state variables are added at every node u to keep track of the state 
of 7r. Also, we assume that every pulse message with pulse p carries the state of 
7r (at the sending node) on pulse number p and p — 1. 

• Whenever pulse u reaches p and p is in the Execute phase, the synchronous pro- 
tocol 7r is executed at node u. (Sometimes the code will cause the pulse number 
at node u to jump from say p to p + 2. In that case, the synchronous protocol 
must be executed at pulse p + 1 and pulse p + 2.) 
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• Whenever pulse u reaches T T , node u corrects its output to be the local output of 

7T. 

The protocol is formally presented below in Figures 9.2 and 9.3. Every node u 
keeps the variables described in Figure 9.2. 

We assume that each node is a user of the reset protocol 7Z\L described in Chapter 
7. As in Chapter 7, we assume that 7Z\L has an interface to make reset requests 
and accept reset signals as shown in Figure 7.2 in Chapter 7. Recall also that the 
reset subsystem also has an output action FREEM that tells the user (in this case the 
spanning tree protocol) when it can send a new user message. Of course, the similarity 
to the FREE action of a UDL is no coincidence. Just as the user of a UDL needs to 
keep a free variable to record whether the UDL is free, each node (exactly as in the 
spanning tree protocol of Chapter 8) will keep a variable free for each neighbor. In 
fact, the entire protocol bears a strong resemblance to the spanning tree protocol of 
Chapter 8 although it performs an entirely different function. 

Every node u periodically sends its own pulse number to all neighbors, using a 
"Pulse" message. The Pulse message is encoded as a tuple (Pulse, p). When a node 
v receives an Pulse message from u, v compares the pulse in the message (say p) to 
the previous pulse estimate stored from u (pulse v [u\). During normal synchronizer 
operation the following local predicate always holds: pulse v [u] < p and p < pulse v + 1. 
(Intuitively, the pulse estimates sent by u to v are non-decreasing and are never more 
than one higher than the current pulse number of v) liv realizes that the local predicate 
has been violated, v makes a reset request. If this is not the case, v stores the latest 
estimate it has received from u; v also updates its own estimate as the minimum of all 
neighbor estimates. 

Once node u finds that Pulse u = Max, it makes a reset request. Once node u gets 
a signal, node u resumes normal synchronizer operation. 

We assume that the topology of the network is specified by an augmented graph 
G. Let 1Z\L be the reset protocol described in Chapter 7. Let SS U be the automaton 
shown in in Figure 9.3. Let S be the composition of the automata node SS U for every 
node in G with 1Z\L. 
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Variables 

• pulse u : Highest pulse which was performed correctly until now. Its domain is restricted 
to lie in {O..Max}. Recall that Max = T^ + D, where D is an upper bound on the network 
diameter. 

• pulse u [v]: Node u's estimate of the current pulse number at node v. In normal syn- 
chronizer, the estimate is always a lower bound on the true value. Its domain is also 
restricted to lie in {O..Max}. 

• free u [v]: Boolean, which if true indicates that link to v is free to accept new messages. 

• requestbit u [v]: Boolean, which is set to true to remember that node u should make a 
reset request in the future. 



Figure 9.2: Declarations of variables at node u used by Resynchronizer code. 

9.6.2 Proof Sketch for the Simplified Resynchronizer Construc- 
tion. 

We outline an argument that explains why we believe the simplified construction is 
correct. We emphasize that this is an intuitive argument that needs further polishing. 

Just as in the spanning tree protocol of Chapter 8, the normal operation of the 
synchronizer can be defined by two one-way predicates Sl(u,v) and S2(u,v) shown in 
Figure 9.4. The first states that a node's pulse number is one higher than the smallest 
estimate it has from any neighbor). The next states that the pulse estimates sent by 
node u to neighbor v are strictly non-decreasing. However, any estimate sent by u can 
never be more than one higher than the current pulse number of v. If we compare the 
local predicates of Figure 8.7 with the local predicates of our spanning tree protocol 
(Figure 9.4) we notice a remarkable similarity between two different protocols. 

If these two local predicates hold for all edges (u,v) we can show that all nodes 
are synchronized - i.e., the pulse number of each node differs by at most 1 from any of 
its neighbors. This is sufficient to ensure that the pulse numbers will eventually grow 
until some node reaches the maximum value. 

Unlike the spanning tree protocol, however, the Resynchronizer protocol keeps 
making reset requests and there will never be final signal intervals in any execution 
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Figure 9.3: Resynchronizer Code for any node u 
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of the protocol. Thus we need to make heavy use of the causality property of reset 
behaviors. For example, we can show that within 0{n) time of any suffix of an exe- 
cution, some node must make a reset request. This is proved by contradiction. If not, 
in linear time all signals must stop and in constant time after this all local predicates 
must hold. (The second part follows because the code makes a reset request if it de- 
tects a violation of S2(u,v). Also, the receipt of the first pulse message at u will make 
Sl(u, *) hold.) Thus by normal synchronizer operation, the node with the lowest pulse 
number will increase its pulse number by 1 in constant time; thus in O(Max) = 0{n) 
time, some node will have reached the maximum pulse number and hence will make a 
reset request. 

The overall argument consists of showing that there is some i-suffix suffix 7 of an 
execution of the Resynchronizer protocol, where t = 0{n) and such that: 

1. All nodes receive a signal event in linear time after the start of 7. 

2. In linear time after all nodes reach pulse number 0, some node reaches pulse 
number Max and all nodes reach pulse number T w = Max — D 

As in the proof of the Global Correction Theorem, we can argue that in linear 
time, all reset requests that are caused by local predicate violations will disappear. 
Thus we choose 7 such that reset requests in 7 are only caused by nodes reaching the 
maximum pulse number Max. 

The intuition behind the first part is that since some node makes a reset request 
within 0{n) time of 7, within 0{n) time in 7 all nodes will receive a signal. Notice 
that when a node gets a signal, the node resets its pulse number to 0. 

The intuition behind the second part is as follows. After all nodes have reached 
pulse number 0, we can again argue that some node makes a reset request in linear 
time. By our choice of 7, a reset request is only caused by some node, say u, reaching 
pulse number Max. But since u was previously at pulse number in 7, w's pulse 
number must have grown from to Max in 7. We then argue that all neighbors of u 
have grown from to Max — 1 in 7, and finally that all nodes have grown from to 
Max — D in 7. 

Finally, if all nodes grow from to Max — D in 7, we argue that by the end of 7 
all nodes have correct output. Intuitively, this is because each node begins executing 
the synchronous protocol at pulse and corrects its output at pulse T w = Max — D. 
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Sl(u,v): 

pulse u = min-[pulse u [v] , v £ neighborSet u } + 1 



S2(u,v): 

If there is an (Pulse, p) message in transit from u to v then 

pulse v [u] < p < pulse u and p < pulse v + 1 
Else pulse v [u] < pulse u 



Figure 9.4: Normal Synchronizer: Local Predicates for edge (u,v). 

9.7 Extensions 

9.7.1 Better Synchronous Checkers for Deterministic Proto- 
cols 

Suppose we limit ourselves to stabilizing protocols n that execute a distinct checking 
protocol x even after reaching a legal state. Let us informally define the stabilization 
bandwidth of n to be the worst case message complexity per link of the checking 
protocol. Clearly the checking protocol must be executed at least once every T time 
units - where T is the stabilization time - even after the protocol has stabilized. 
Hence this is really a bandwidth cost. For example, the stabilization bandwidth of 
the protocol in [KP90] is the worst case message complexity per link to do a snapshot, 
which is 0(772), where 772 is the number of links. 

If we check a deterministic protocol n by re-executing 7r, we have to pay a high 
price (T w ) in stabilization bandwidth. Suppose instead, we can find a checker % for n 
that has T x = 1 - i.e., can check n in a single pulse. Then after the execution phase 
we can add a single check pulse number T cp . When a node reaches pulse T cp it stays 
at T cp executing the checking protocol until it detects a problem; if it does detect a 
problem it drops back to pulse 0. By avoiding multiple pulses for checking, we remove 
the need for costly a termination detection and reset operation in the checking phase. 
The stabilization bandwidth drops to 0(5' 7r ). 
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There are a number of tasks that we can check in a single pulse. These include 
the classical problems of shortest paths, topology update, leader election, computing a 
spanning tree, and computing a maximum flow. Because the first four tasks are com- 
monly used in real networks, it is important to improve their stabilization bandwidth. 

We do so by what by essentially introducing a new form of local checking which 
we will call synchronous local checking. The idea of doing local checking was already 
introduced in previous chapters. In the previous chapters, we described asynchronous 
protocols which continuously checked the state of every link subsystem. Note that 
state of of every link subsystem includes the state of the channels as well as that 
of the nodes. However, to build an efficient synchronous checker, all we have to do 
is to locally check a synchronous protocol that has terminated. Hence the checking 
procedures are much simpler. Note that for a synchronous protocol, the state of every 
link subsystem consists only of the state of two nodes. 

Consider the problem of checking the shortest cost paths from a given source to 
every other node. The key is the ability to check, in a single pulse, the shortest 
cost distances from a given source to every other node. Let s denote the source 
and Di denote the distance node i has recorded to s; let B{j denote the cost of a 
link from Node i to Node j, and N(i) the set of neighbors of Node i. Then in a 
checking pulse a) s checks that D s = b) Nodes other than s check that Di > and 
Di = Minj e N(i)(Dj + Bij). It is quite simple to prove that if any of the distances are 
wrong some node will detect this after checking. 

By adding an extra distance label to the other tasks, the other tasks can also be 
checked in one pulse, using the same trick. We can do this, for instance, in leader 
election by adding the distance to the leader, and in maximum flow by adding the 
distance to the source in the residual graph. 

In general, given our application, the design of faster checkers for synchronous 
protocols becomes an interesting and practical problem. For instance, we can we 
check a minimum spanning tree in fewer pulses than it would take to compute the 
tree from scratch while using only small storage? There are a number of similar open 
problems that arise from our work. 
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9.7.2 Randomized Protocols 

In all our models, both of asynchronous and synchronous protocols we have disallowed 
the possibility of nodes tossing coins. Thus our formal models preclude the description 
of randomized protocols. In a randomized protocol, nodes are given the ability to toss 
coins. The definitions of correctness now become probabilistic and it is much harder 
to give careful definitions of stabilization. However, we will assume that the reader 
has an intuitive understanding of randomized protocols to understand the following 
informal discussion. 

First, for randomized protocols, we will assume that any supposedly random bits 
in the initial state of a node can be arbitrarily corrupted and hence non-random. 
However, subsequent coin-tosses will produce truly random bits. 

Next, note that we need a separate checker to compile a randomized protocol. 
This is because re-executing the original protocol can lead to a different output, and 
cause the checker to detect an error when there was none. Saving the original random 
bits in the state does not help either as these bits could be corrupted (see previous 
assumption). Further, this checker must be oblivious: it must not depend on the 
correctness of the supposedly random bits currently in the state. It appears that a 
stabilizing algorithm that uses a randomized checker needs an infinite supply of random 
bits since it cannot rely on the old random bits at any stage. 

A simple example of compiling a randomized protocol is furnished by the problem 
of electing a leader in an anonymous network - i.e., a network in which nodes do 
not have any unique IDs. 2 Clearly we need randomization to break symmetry. To 
construct a stabilizing protocol for this task, we demonstrate a synchronous protocol 
for execution together with an oblivious synchronous checker. 

Recall that we use D to denote an upper bound on the diameter of the network. 
In the execution protocol, each node picks a random ID uniformly and independently 
in a space of 1..X . In the next D pulses, a node considers itself as the leader if it finds 
out that its ID is the largest in the network. In the checking protocol, each node i that 
considers itself a leader picks a new random value U in the space 1..X and broadcasts 
ti during the next D pulses. At the end of D pulses, a node detects an error if it has 
received either no values or it has received more than one value. While both checking 



2 If nodes have unique IDs we must assume that the IDs are protected; dropping this assumption 
makes the system more fault tolerant. 
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and execution can fail, by picking X to be a polynomial (of sufficiently high degree) in 
the number of nodes, we can ensure that a correct output will be produced in constant 
expected number of phases. Since each phase takes 0(D) the expected stabilization 
time is 0(D). 

A more efficient protocol for this purpose (that works in time proportional to the 
actual diameter as opposed to a bound on the diameter) is given in [DIM91b]. However, 
our solution seems to be simpler. As in the case of deterministic protocols, checking 
randomized synchronous protocols seems an interesting research area. 



9.8 Summary and open problems 

The main result of this chapter the Resynchronizer, is a compiler that transforms any 
synchronous protocol into a stabilizing version for dynamic asynchronous networks. 
The transformation adds 0(n + D) overhead to the time complexity of the protocol, 
where D is a bound on the diameter of the final network. Clearly D and n can be 
much larger than the actual diameter of the final network. A natural open problem is 
to obtain a compiler whose time overhead only depends on the actual diameter of the 
final network. 

When the results in this chapter were first presented [AV91], the Resynchronizer 
compiler used a much more complicated construction (which could be regarded as a 
special reset protocol optimized for the case of synchronizer operation). The transfor- 
mation in [AV91] added only 0(D) overhead to the time complexity of the protocol, 
removing the factor of n which arises from the use of the reset protocol described in 
Chapter 7. However, for many real networks the only meaningful bound on the diame- 
ter of the network is the number of nodes. Thus this does not seem to be a meaningful 
distinction in practice. Also, the construction in this chapter is much simpler than the 
original. However, a careful proof of the simplified construction is needed. 
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Chapter 10 

Conclusions and Open Questions 



This thesis investigates the power and applicability of local checking and correction for 
the design of stabilizing network protocols. The thesis provides a rigorous theoretical 
foundation for these concepts. However, the emphasis is on using these concepts to 
devise novel fault-tolerant protocols for real networks. 

We have confined the notion of "local" to link subsystems consisting of a pair of 
neighboring nodes and the two unidirectional links between them. 1 We have defined 
a protocol to be locally checkable for a good property L if two conditions hold. First, 
there is a local property for each link subsystem (formalized by a predicate of the link 
subsystem) such that property L holds if the local properties of all link subsystems 
hold. Second, each local property is closed - i.e, once the local property is true, it 
remains true. 

We have also defined a protocol to be locally correctable to property L if the protocol 
is locally checkable for L and there is a way to locally correct each node with respect 
to a link subsystem (formalized by the existence of a local reset function) such that the 
global property L becomes true. The index contains pointers to the formal definitions 
of these concepts. 

This chapter summarizes and evaluates the major contributions of the thesis. The 
chapter ends with a list of open questions. 



1 However, in the list of open questions we suggest that this notion of locality can be usefully 
generalized. 
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10.1 Contributions 

The major contribution of this thesis is the construction of general techniques for de- 
signing stabilizing protocols. All our techniques revolve around the concepts of local 
checking and correction. We use the general techniques to design new stabilizing pro- 
tocols (many of which are practical) and to understand existing stabilizing protocols. 
In the process of formalizing these techniques, we were also led to invent new defini- 
tions of stabilization, a simple modularity theorem, and a new Data Link model. We 
believe this modelling effort is useful in its own right. Thus we divide the contributions 
of this thesis into five categories: 

• General techniques for constructing stabilizing protocols. 

• New or improved stabilizing solutions to specific problems. 

• The Modularity Theorem 

• Modelling of self-stabilization. 

• Better understanding of existing work in self-stabilization. 

We describe these contributions in more detail in the following subsections. 

10.1.1 General Techniques 

The thesis contains four general techniques and a powerful heuristic. The four general 
methods are all organized around the theme of local checking and form a natural 
progression of ideas. The methods are: 

• Local Correction: The Local Correction Theorem (Theorem 5.4.3) of Chapter 
5 shows that every locally checkable and correctable protocol can be stabilized in 
time proportional to the height of the underlying partial order. This is achieved 
by adding actions to do independent checking and resetting of each link subsystem 

The Local Correction Theorem formalizes a useful design technique for building 
stabilizing protocols. First, we add some (hopefully small) state to the protocol 
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to make it locally checkable. Next, we look for a local reset function that can be 
used for local correction. In some cases, it is possible to construct a local reset 
function by combining several actions of the original protocol. In Chapter 7 we 
conjectured that this (i.e., the construction of a local reset function) is possible 
for many existing protocols that work in dynamic networks. The stabilizing reset 
protocol of Chapter 7 is an example of this design technique. 

• Tree Correction: The Tree Correction Theorem (Theorem 6.5.1) of Chapter 5 
shows that every locally checkable protocol that works on a tree topology can be 
stabilized in time proportional to the height of the tree. Thus we can dispense 
with the need for local correctability if the underlying topology is a tree. This is 
achieved by adding actions to do independent checking and resetting of each link 
subsystem in such a way that the correction of a link (u,v) does not invalidate 
the correctness of links above link (u,v) in the tree. 

This theorem also formalizes a useful design technique. First, we construct a 
spanning tree of the network using a stabilizing tree protocol. Next, we design 
a locally checkable protocol to solve the desired problem that works over a tree. 
Finally, we apply the transformation underlying the Tree Correction theorem. 
The stabilizing token passing protocol of Chapter 6 is an example on which this 
design technique could be applied. 2 

• Global Correction: The Global Correction Theorem (Theorem 8.1.2) of Chap- 
ter 8 shows that every locally checkable protocol can be stabilized in time pro- 
portional to the number of network nodes. That is we can dispense with the 
need for local correctability or the restriction to a tree topology if we are will- 
ing to incur a stabilization time that is proportional to the number of network 
nodes. This is achieved by adding actions to do independent checking of each 
link subsystem as well as actions to do a coordinated global reset of the entire 
network if a local violation is detected. 

This theorem also formalizes a useful design technique. We construct a locally 
checkable protocol and then apply Global Correction. However, since the sta- 
bilization time of this method is comparatively high, it pays to use the Local 
Correction Theorem whenever a locally checkable protocol can also shown to be 



2 However, in Chapter 6 this example is derived using Local Correction. 
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locally correctable. The stabilizing spanning tree protocol of Chapter 6 is an 
example of the Global Correction technique. 

• Stabilizing Compiler for Synchronous Protocols: The Resynchronizer and 
Rollback Compiler ideas in Chapter 9 show that any synchronous protocol n with 
time complexity T w can be converted to an asynchronous, stabilizing version of 
7r, with either an additive cost of 0{n) in stabilization time or a multiplicative 
factor of T-n- in storage. Thus for such protocols we can dispense with even the 
need for local checkability. This is achieved by applying global correction to a 
simple synchronizer protocol. While Chapter 9 sketches the main ideas for both 
compilers, the method still requires further development, especially to complete 
and simplify the proofs. 

The synchronous compilers also suggest a useful design technique. Suppose the 
correctness of a task can be specified by an Input/Output relation. Then we 
can construct a synchronous protocol for a given task and then apply one of the 
two compilers. This technique provides stabilizing solutions for many tasks for 
which locally checkable solutions are not known to exists. For instance, this can 
be used to produce efficient and stabilizing minimal spanning tree, min-cost flow 
and coloring protocols. 

Besides the four general methods, we have a useful heuristic. 

Heuristic of Removing Unexpected Packet Transitions: In many cases we 
can make a protocol weakly locally checkable for a good property L (i.e., L holds if 
the local properties of all link subsystems hold) by adding a small amount of state. 
To make the protocol locally checkable for L we also have to ensure that each local 
property is closed - i.e., if the local property is ever true, it remains true. This can 
often be done by the method of removing unexpected packet transitions described in 
Chapter 6. The basic idea is that when a node u receives a packet p from a neighbor v, 
u only accepts the packet if this transition could have occurred in some good state of 
the (u, v) subsystem. If the (u, v) subsystem is in a bad state but the (u, w) subsystem 
is not, the only way the (u,v) subsystem can affect the (u,w) subsystem is if v sends 
"unexpected" packets to u. Weeding out such unexpected packet transitions is often 
sufficient to ensure that each local predicate remains closed. 

The heuristic of removing enexpected packet transitions is used throughout the 
thesis. It is used in the Token Passing Protocol of Chapter 6, the Reset Protocol of 
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Chapter 7, the Spanning Tree Protocol of Chapter 8, and the Resynchronizer Protocol 
of Chapter 9. 

Comparison with Only Previously Known General Technique 

The only previously known general method for stabilization is the elegant result of 
Katz and Perry [KP90]. How do our methods compare with the general method of 
[KP90]? Recall that in [KP90], checking is done by a single leader node periodically 
taking a snapshot of the entire network. 

• Message Congestion: From a practical standpoint, the most important differ- 
ence is that all our methods have considerably smaller stabilization bandwidth 
than that of [KP90]. Recall from Chapter 9 that stabilization bandwidth is the 
periodic overhead of checking that must be paid even when the protocol is in a 
good state and behaving correctly. Suppose the protocol is to stabilize in time T. 
Roughly, [KP90] has to pay the price of a snapshot of the entire network every 
T units of time. Now a snapshot requires at least 0{rn) state, where m is the 
number of network links. Thus links leading to the leader node must carry 0{rn) 
message bits every T units of time. Thus the worst case bandwidth per link is 
very high. By contrast, in our local correction, tree correction, and global cor- 
rection methods, each link only carries a constant number of message bits every 
T units of time. Even the naive synchronous compilers of Chapter 9 have better 
stablization bandwidth than the method of [KP90] because the communication 
overhead of checking is spread out among all the links rather than being concen- 
trated on links leading to the leader. In real networks, each link has a limited 
bandwidth and the worst case bandwidth requirement is an important measure 
of link congestion. 

• Speed: If all links are UDL's, the method of [KP90] requires 0{n) time to stabi- 
lize, where n is the number of network nodes. The local correction method, tree 
correction method, and Rollback compiler can all provide much faster stabiliza- 
tion times. 

• Storage: The method of [KP90] requires 0{rn) storage at the leader to store 
the snapshot information, where m is the number of links. For fault-tolerance, 
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every node must be prepared to be the leader of the network and be able to 
store 0{rn) information bits. By contrast, except for the Rollback compiler, the 
storage required by our methods is negligible. 

• Generality: The method of [KP90] is clearly more general than our methods. Our 
methods require protocols to be either locally checkable or to have a correctness 
specification that can be expressed (see Chapter 9) in terms of an I/O relation. 
However, there is at least one case where the method of local checking and 
correction is applicable where the method of [KP90] is not. This is the stabilizing 
end-to-end protocol that we describe in [APV91b]. The problem here is that some 
unknown set of network links may have infinite delay. Thus the global snapshot 
of [KP90] may never terminate. However, it is sufficient ([APV91b]) to do local 
checking and correction on the so-called viable links that have bounded delay. 

To summarize, our methods are less general but more efficient than the method 
of [KP90]. Despite the loss of generality, our methods can be used to stabilize many 
useful tasks, as summarized below. 

10.1.2 New or Improved Stabilizing Solutions for Specific Prob- 
lems 

Our general techniques provide new or improved solutions for Mutual Exclusion, Net- 
work Resets, Computing Spanning Trees, Topology Update, Minimal Spanning Trees 
and other tasks. We have also applied local correction to an important theoretical 
problem - the problem of end-to-end message delivery in unreliable networks. In this 
problem links can fail continuously - the only guarantee is that there is no cut of 
permanently failed links that separate the sender and receiver. We have not described 
our solution to this problem in this thesis but more details can be found in [APV91b]. 

There are a number of other protocols to which we believe local checking and 
correction can be applied. These include the efficient resource allocation algorithm of 
Awerbuch and Saks ([AS90]), and the elegant virtual circuit protocol due to Spinelli 
([Spi88b]). We hope to produce stabilizing versions of these protocols. We believe our 
general methods provide solutions to many important networking tasks. 
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We emphasize that many of our solutions are practical and can be applied in real 
networks. Messages required for local checking can easily be piggybacked on the "keep- 
alive" traffic sent between neighbors in real networks. Thus solutions based on the 
first three of our general methods (all of which are based on local checking) can be 
added to real networks without appreciable loss of efficiency. Solutions based on the 
synchronous compilers of Chapter 8 can be practical if either the time complexity of the 
underlying synchronous protocol is low (for the Rollback compiler) or if the underlying 
synchronous protocol has an efficient synchronous checker (for the Resynchronizer 
compiler). A disadvantage of the synchronizer methodology is that the method tends 
to slow down the network to the speed of the slowest link. Thus the synchronous 
compiler methodology is best suited for use in networks where all links have roughly 
the same speed. 

Among the most useful practical protocols described in this thesis are the stabilizing 
reset protocol of Chapter 7 (which was briefly tested in a trial implementation on the 
AUTONET) and the spanning tree and topology update protocols of Chapter 8. The 
topology update protocol also illustrates another general paradigm that may be useful 
in practice. We start with a simple protocol P that uses unbounded sequence numbers 
and use global correction to convert P into a stabilizing version P' that uses bounded 
sequence numbers. In the absence of catastrophic errors, P' is as efficient (except for 
the small overhead of local checking) and simple as P. 

10.1.3 Modularity Theorem 

The Modularity Theorem (Theorem 3.5.7) is simple but extremely useful. It helps 
us to prove facts about the stabilization of a big system by proving facts about the 
stabilization of each of its parts, as long as each part is suffix-closed. The modularity 
theorem gives us a formal basis for a building-block approach. For example, we can 
start with a stabilizing implementation of a UDL; use this to build a stabilizing reset 
protocol as shown in Chapter 7; and then use the reset protocol as a building block 
to construct a spanning tree protocol (Chapter 8) and a compiler for synchronous 
protocols (Chapter 9). As we have argued at the end of Chapter 4, the requirement 
that each of the parts be suffix-closed is not very restrictive. Essentially, this is because 
our methods tend to produce CIOAs (i.e., automata in which every reachable state is 
a start state) that are suffix-closed. 
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As we build up a complex stabilizing protocol in several layers, the stabilization 
time of the system can be calculated by applying the Modularity Theorem and the 
Transitivity Lemma for behaviors (Lemma 3.2.6). For example, suppose the overall 
system is described by an automaton P. Suppose also that P is identical to the 
stabilizing reset protocol 7Z + of Chapter 7 except that every UDL C UiV in 7Z + is replaced 
by a stabilizing implementation of a UDL called C' uv . Suppose that C' uv stabilizes to 
the behaviors of C UiV in time t^. Then by applying the Modularity Theorem (which 
is possible because all the constituent automata in 7Z + are UIOA) we conclude that 
P stabilizes to the behaviors 7Z + in time t^. But we know from Chapter 7, that 7Z + 
stabilizes to the behaviors of 7Z\L in some constant time, say t 2 . (Recall that 7Z\L is a 
"correctly working" reset protocol in which all local predicates are true in the initial 
state.) Thus we conclude from the Transitivity Lemma that P stabilizes to 7Z\L in 
time ti + t 2 . Notice that the stabilization times of each "layer" add up due to the 
Transitivity Lemma. Proceeding similarly, we can calculate the stabilization time of 
a version of the stabilizing tree protocol of Chapter 8 in which the reset protocol 7Z\L 
is replaced by P. 3 

Our theorem can be viewed as a generalization of previous modularity results 
[DIM90]. Previous results only applied to the case when a lower layer protocol com- 
puted values of a shared variable that were read by a higher layer protocol. By contrast, 
our theorem applies to more dynamic interaction between the various components of 
a system. 

10.1.4 Modelling 

The models and proof techniques we use in this thesis are based on the large body of 
existing work that has been done in the area of protocol verification. Our model of 
computation is based on the timed I/O automaton [MMT91] which is in turn based 
on the I/O automaton of [LT89]. We have even described the shared memory network 
model in Chapter 4 as an I/O automaton that meets certain restrictions. 

We also use standard proof techniques. We use mapping techniques (Refinement 
Mapping Theorem, Theorem 3.4.3) to show that one automaton has the same behaviors 
as another automaton. We prove that a local predicate is closed using the standard 



3 This time, however, we can apply the Modularity Theorem because TZ\L is a CIOA and the other 
nodes implementing the spanning tree protocol are UIOA. 
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inductive techniques used to prove program invariants. Perhaps the only unusual 
technique is the Execution Convergence Theorem (Theorem 3.4.5) which is a key tool 
for proving stabilization results. To apply this theorem we have to prove a stability 
condition and a "liveness" condition. We prove the stability condition using standard 
inductive arguments. We prove the "liveness" condition by proving time bounds on the 
occurence of events. We have proved time bounds in a fairly ad hoc, operational way. 
However, there is no reason why the time bounds could not be established by more 
rigorous inductive arguments (as in [LA90]). Thus while the Execution Convergence 
Theorem may appear slightly unusual, it is really a combination of existing verification 
techniques. 

We believe this continuity (with the body of existing work in verification) is an 
advantage of our work. By contrast, previous papers on stabilization have sometimes 
tended to invent new models and proof techniques. Despite our strong linkage with 
existing work, we do have two interesting modelling contributions in this thesis. These 
are the use of a behavior specification for stabilization and the use of unit storage Data 
Links. 

Behavior Specification 

The definitions of stabilization in terms of external behaviors are different from previ- 
ous definitions that are in terms of the states and executions of the underlying automa- 
ton. The external behavior definition allow us to define that automaton A stabilizes to 
another automaton B even though A and B have different state sets. This is most use- 
ful when A is a low level model (e.g., an implementation) and B is a high level model 
(e.g., a specification). This can be done using a definition of stabilization in terms of 
executions if we are prepared to introduce an abstraction function into the definitions 
as in Lamport's work [Lam83]. However, we prefer the behavior definitions as they 
seem to be more natural; we prefer to use the equivalent of an abstraction function to 
prove behavior stabilization results using the Refinement Mapping Theorem. 

Unit Storage Data Links 

In a stabilizing setting it is necessary to define Data Links that have bounded storage. 
First, as shown in [DIM91a], almost any non-trivial network task is impossible in 
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a stabilizing setting in which the links have unbounded storage and the nodes are 
restricted to be finite state machines. Second, bounded storage models correspond to 
physical reality. 

Thus we use the standard asynchronous message passing model of a computer 
network except that each link is what we call a Unit Storage Data Link (UDL) that 
can store at most one packet. We have chosen unit storage links (UDLs) because 
they are practical (see Section 5.2) and they can be modelled elegantly. We have also 
defined a stabilizing interface to a UDL. This is done by having the link periodically 
deliver a free signal (to avoid deadlock) and by having the sender keep a variable that 
indicates whether the link is free. We hope the UDL model will be used by others. 
The UDL model can easily be generalized to Bounded Storage Data Links by changing 
the free signal to report the number of packets currently stored in the link. 

Earlier papers in stabilization (e.g., [DIM90]) seem to have used shared memory 
models for communication in order to avoid the problems caused by unbounded storage 
links. It does seem very likely (see Open Problems) that protocols that work in the low 
atomicity shared memory model of [DIM90] can be transformed to work correctly in our 
network model. However, we believe that our protocol descriptions are more accessible 
to designers of "real" network protocols because most "real protocol" specifications 
assume the use of message passing primitives. 

10.1.5 Understanding Existing Work 

The concept of local checking helps in crisply understanding many existing stabilizing 
protocols. Chapter 4 shows that some existing work in the shared memory model can 
be understood crisply in terms of local checking and correction. We believe that many 
existing stabilizing protocols can be understood using three general ideas - one way 
checking and correction, counter flushing, and timer flushing. 

One Way Checking and Correction 

In Chapter 8, we said that a protocol P was one-way checkable if it was locally check- 
able using what we called one-way predicates. Intuitively, a one-way predicate is a 
predicate that involves the state of a node u, the state of any neighbor v of u, and 
the state of the link between u and v. Unlike a general local predicate, a one-way 
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predicate only depends on the state of one of the links between the two nodes. We 
saw that one-way checkable protocols could be checked without the need for a local 
snapshot - it is sufficient for each node to periodically send its state to its neighbors. 
The protocols in [Per83] and [Per85] do checking in this way. 

In some cases, the protocol is also one-way correctable - i.e., we can apply local 
correction to a one-way checkable protocol without the need for a local reset of the link 
subsystem. For example, when v receives a copy of w's state and detects a violation 
of the one-way predicate from u to v, it may be possible for v to apply a local reset 
function to its own state so as to make the one-way predicate true. Of course, this 
could falsify other "adjacent" one-way predicates. For the protocol to be one-way 
correctable, the dependency relation among the one-way predicates must be acyclic, 
as in the definition of local correct ability. For example, the Rollback protocol of 
Chapter 9 is one-way correctable. We speculate that the stabilizing topology update 
protocol of Spinelli-Gallager [SG89] is also one-way correctable 

Counter Flushing 

Suppose a sender wishes to periodically send a REQUEST packet to a set of network 
nodes. The responders must each send back a RESPONSE packet before the sender 
sends its next request. In Chapter 5, for example, we implemented local snapshots 
and resets using such a request-response protocol initiated by the leader of each link 
subsystem. In order to properly match responses to requests, the sender numbers each 
request with a counter. Let m be the number of packets that can be in transit between 
the sender and responder and let n be the number of responders. Then the sender uses 
a counter that has Max = m + n + 1 distinct values. For example, in Chapter 5 we 
used a counter in the range 0...3 because there can be at most two packets in transit 
in a link subsystem and there is only one responder. 

Responders only accept REQUEST packets with a number different from the last 
REQUEST accepted. After accepting a REQUEST the responder sends back a RESPONSE 
with the same number as the REQUEST. The sender retransmits the current REQUEST 
till it receives each matching RESPONSE with the same number. After all matching 
RESPONSE packets arrive, the sender increments its counter. 

The size of Max ensures that within Max increments of the counter, the sender 
will reach what we call a "fresh" counter value - i.e. a counter value that is not 
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currently stored in either the links or the responders. We call the method counter 
flushing because the request-response protocol must guarantee the following "flushing" 
property. Suppose the sender sends a request numbered c, where c is a fresh value. Then 
after all matching responses to this request arrive, there must no counter values other 
than c that are stored in the links or at the responders. In other words, the sending of 
a freshly numbered request and the receipt of all matching responses, should "flush" 
the links and responders of "old" counter values. 

In Chapter 5, the flushing condition is guaranteed because the sender and receiver 
are connected by two FIFO links in either direction. Similar forms of counter flushing 
can be used to implement Data Links ([AB89]) and token passing [DIM91a]) in link 
subsystems with bounded storage. Counter flushing is, however, not limited to link 
subsystems. The first example in [Dij74] can be simply understood as counter flushing 
in a unidirectional ring (see Appendix E for more details). Katz and Perry [KP90] 
extend the use of counter flushing to arbitrary networks in an ingenious way. Our sta- 
bilizing end-to-end protocol ([APV91b]) is obtained by first applying local correction 
to the Slide protocol [AGR92] and then applying a variant of counter flushing to the 
Majority protocol of [AGR92]. 

Timer Flushing 

The main idea is to bound the lifetime of "old" state information in the network. This 
is done by using node clocks that run at approximately the same rate and by enforcing 
a maximum packet lifetime over every link. State information that is not periodically 
refreshed is "timed out" by the nodes. In Perlman's [Per83] topology update protocol, 
timer flushing is used to get rid of erroneous updates that are numbered with the 
maximum possible sequence number. In Perlman's [Per85] spanning tree protocol, 
timer flushing is used to get rid of "ghost" roots (see Chapter 8 for details.) Spinelli 
[Spi88b] uses timer flushing to build stabilizing Data Link and virtual circuit protocols. 

10.2 Open Questions and Further Problems 

The following is a list of further problems. They are arranged under four categories: 
modelling, increased understanding of local checking and correction, new algorithms, 
and new directions. 
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10.2.1 Modelling 

The following is a list of what we believe are the further problems that are motivated 
by our work in modelling stabilizing protocols. 

• Extend model and theorems of Chapter 3 to allow the use of ran- 
domization and lower bounds on the time between actions: The model 
of Chapter 3 makes no provision for randomization or for lower bounds on the 
time between actions. While the model in Chapter 3 is sufficient for modelling 
asynchronous protocols it cannot be used to model many interesting randomized 
and timing based algorithms. This seems to be an important problem. The hard 
part is extending the model to obtain corresponding versions of the modularity 
and transitivity results. 

• Find a Proof technique for Generalized Stabilization Definitions: The 

definitions of a stabilizing reset protocol and other protocols can be made more 
elegant if we modify the stabilization definitions as follows. We only require that 
the i-suffix of a behavior (execution) be a i-suffix of a behavior (execution) of 
the target set. One problem with this modified definition is that we know of no 
good proof technique to prove that the behaviors (executions) of an automaton 
are suffixes of a specified set of behaviors (executions). While a general proof 
technique for this purpose may be infeasible, it may be possible to find a proof 
technique that works for a large number of cases. 

• Discover stronger variants of the modularity theorem. The modularity 
theorem allows us to infer the stabilization properties of a large system from the 
stabilization properties of its pieces. However, we required that each piece be 
suffix-closed. It is natural to look for weaker or alternate conditions. This may 
be somewhat technical as the suffix-closed requirement does not appear to be 
very restrictive. 

• Find a General Definition of Stabilization Bandwidth: In Chapter 9, 
we intuitively described an important measure for a stabilizing protocol - the 
amount of periodic bandwidth the protocol consumes. The definition we gave 
in Chapter 9 only applies to protocols that have a certain structure. A precise, 
general definition would be very useful. 
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• Find a way of Compiling Stabilizing Protocols in the Shared Mem- 
ory Model into our Network Model. A number of interesting stabilizing 
protocols have been described using the low atomicity, shared memory model 
introduced by [DIM90]. It would be nice to have a compiler that that could 
convert protocols from their model to our network model and vice versa. This 
seems to be feasible. 

10.2.2 Increased Understanding of Local Checking and Cor- 
rection 

Our understanding of local checking and correction is far from complete. This moti- 
vates the following problems: 

• Obtain a better understanding of local checkability. Recall that a protocol 
is locally checkable for some good predicate L, if L is a conjunction of local 
predicates, and each local predicate is closed. Thus we can study this problem 
by asking two separate questions. 

— How much state do we need to add to a protocol so that it's 
legal states are a conjunction of local predicates? In the token 
passing protocol of Chapter 6 and the reset problem of Chapter 7, we made 
protocols locally checkable by adding a constant amount of state to each 
node. It is natural to ask what the minimum amount of storage is that 
must be added in order to make a protocol locally checkable. 

— Can any weakly locally checkable protocol be transformed into an 
(equivalent) locally checkable protocol? Recall that a weakly locally 
checkable protocol is a protocol whose legal states are a conjunction of 
local predicates; however, the local predicates are not necessarily closed. 
In Chapter 6 we described a simple protocol transformation that consisted 
of removing unexpected packet transitions. In many cases, this heuristic 
is sufficient to ensure that the local predicates are closed. We have used 
this heuristic successfully but we still don't completely understand when 
this heuristic is guaranteed to work. In Chapter 6, we described a sufficient 
condition called local extensibility. Are there weaker conditions than local 
extensibility? 
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• Obtain a better understanding of local correctability. In Chapter 5, we 
used a fairly operational definition of local correctability in terms of a local reset 
function. That definition is adequate for the thesis but it gives little insight into 
the structure of locally correctable protocols. Why are some protocols locally 
correctable but not others? Are there simpler sufficient conditions? It is also 
important to formally understand the connection between local correctability 
and locally checkable protocols that work in dynamic networks. 

• Generalize the Definitions of Locality and Local Checkability in Chap- 
ter 5: In Chapter 5, we defined locality in terms of link subsystems. More gener- 
ally, a subsystem is the composition of the nodes and channels corresponding to 
some subgraph of the network graph. For example if we define locality using the 
subsystems corresponding to the entire network graph, then our method reduces 
to the method of Katz and Perry [KP90]. Another interesting subsystem would 
consist of a single node and all its incoming and outgoing channels. Another 
interesting possibility is to consider subgraphs defined by the sparse network 
partitions [AP90] defined by Awerbuch and Peleg. 

Another simple generalization is to allow more than one local predicate per local 
subsystem. For example, consider an example consisting of two link subsystems 
and four link predicates, Li,L 2 ,L 3 and Z 4 . Suppose the system is locally cor- 
rectable using a partial order < such that L\ < L 2 < L 3 < Z 4 . Then as we apply 
local checking and correction to this system L\ will become true first, followed 
by L 2 followed by L 3 and then Z 4 . Having multiple local predicates per local 
subsystem is only useful if these local predicates are independently ordered in 
the partial order. (If this were not the case we could simply work with a new 
local predicate that is the conjunction of all the predicates.) An example of the 
need for this generality is the Rollback protocol in Chapter 9. 

• Obtain a better understanding of Synchronous Checking In Chapter 9, 
we introduced the notion of synchronous checking, which is a synchronous pro- 
tocol that can check the output of another synchronous protocol. Synchronous 
checking is a little easier than asynchronous checking because all the nodes run 
in lockstep and we do not have to worry about messages in channels. Also, the 
checker need only check the final output of the protocol and not the intermediate 
states. Are there synchronous checkers for minimum spanning tree and max flow 
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protocols that are faster than the protocols being checked? 

10.2.3 New Algorithms 

We suggest the following algorithmic problems: 

• Invent a practical stabilizing reset protocol that stabilizes in 0(d) time, 
where d is the actual diameter of the network. By way of comparison, the 
reset protocol of Chapter 7 stabilizes in 0(n) where n is the number of nodes 
in the network. This seems to be a major open problem. A solution to this 
problem would yield 0(d) stabilizing solutions for the spanning tree, topology 
update, and other problems described in Chapters 8 and 9. To be practical, the 
constants hidden in the 0(d) notation must be small (between 1 and 3). 

• Invent a practical stabilizing token passing protocol for rings and ar- 
bitrary graphs: We have already described how to construct a tree from an 
arbitrary graph, and shown how to execute a stabilizing token passing protocol 
on a tree. However, the latency of token traversal on a tree can be quite high, 
for example if the original graph is a ring. The first example in [Dij74] appears 
extensible to token passing in a ring. An efficient stabilizing token passing pro- 
tocol for rings would be useful and practical. Existing token ring protocols such 
as the IEEE 802.3 and FDDI protocols have a number of ad hoc mechanisms to 
deal with failures. 

• Invent a Stabilizing Clock Synchronization Scheme: Fault-tolerant clock 
synchronization schemes typically have to defend against Byzantine faults. It 
would be desirable to invent a practical, stabilizing version of such a protocol. 
This would be an interesting example of a protocol that is robust against both 
Byzantine faults that continue as well as catastrophic faults that stop. 

• Simplify Existing Flow Control Schemes for Transport Protocols and 
Data Links: A flow control scheme is a scheme by which a receiver regulates 
the rate at which a sender sends data in order to prevent buffer overflows at 
the receiver. In [CSV89] we propose an extremely simple stabilizing flow control 
scheme for physical links. It can be considered to be a trivial application of 
local checking and correction to the sender-receiver flow control protocol. The 
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resulting protocol is robust and simple enough to be implemented in hardware. 
Now, robust flow control schemes are a major component of Transport and Data 
link layer protocols. Perhaps existing flow control schemes can be simplified 
using local checking and correction. 

• Invent a stabilizing, distributed protocol to compute Sparse Parti- 
tions: Awerbuch and Peleg [AP90] have shown how to decompose a network 
into what they call sparse partitions. They use sparse partitions to build effi- 
cient solutions for online tracking of mobile users, network synchronization, and 
network routing with low memory. The most efficient distributed algorithm for 
constructing sparse partitions is based on the work of Linial and Saks [LS91]. 
A stabilizing distributed algorithm for sparse partitions would be an extremely 
useful tool. 

• Make the Synchronizer Methodology practical for networks with links 
of different speeds: The synchronizer methodology was introduced in [Awe85] 
and is extended to a stabilizing setting in Chapter 9. However, the method 
suffers from a severe drawback in networks in which links have varying delays. 
Essentially, it slows down all links to the speed of the slowest link. It may be 
possible to modify the methodology to avoid this drawback. 

• Invent a stabilizing version of the Bootstrap Protocol for End-to-End 
Communication: Our previous work on stabilizing end-to-end communication 
[APV91b] has concentrated on producing a stabilizing version of the simple and 
elegant Slide protocol [AGR92]. However, the Bootstrap Protocol [AG91] is 
more efficient in some cases, and so a stabilizing version would be of theoretical 
interest. The current solutions are still too inefficient (for example in storage) 
for the end-to-end problem to have any practical application. 

10.2.4 New Directions 

We suggest the following new directions for research in self-stabilization. 

• Local Checking for Randomized Protocols: Local checking of randomized 
protocols is an important research area. There are several randomized protocols 
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(e.g., the Rabin-Lehman dining philosophers protocol of [RL81]) that use ran- 
domization to break symmetry and to guarantee termination. Such protocols 
can often be locally checked in a deterministic fashion. For example, in graph 
coloring, it is easy to check whether a neighboring node has the same color. If, 
however, the check reveals a problem, a randomized local reset action must often 
be applied. Thus in graph coloring, a node may randomly choose a new color 
once it discovers that it has the same color as one of its neighbors. Now the 
randomized local reset functions can be applied at a node after all other nodes 
have already chosen their colors. Thus the dynamics (and the analysis) of the 
stabilizing protocol may be quite different from that of the original randomized 
protocol. Preliminary work in this area has been done by Baruch Awerbuch, 
Leonore Cowen, and Mark Smith at MIT. 

Protocols whose behavior is correct only with high probability are even harder 
to check locally. Suppose that a (non-stabilizing) randomized protocol computes 
sparse partitions such that with high probability the partitions have low diam- 
eter. Then even if local checking can detect "poor" partitions, how can we tell 
whether the partitions are bad a) because of errors in the initial random bits at 
the nodes or b) because of a low probability outcome of the protocol? 

• Stabilizing Data Structures: We have suggested earlier that programs that 
run on a single shared memory can be made stabilizing by using domain re- 
striction - the states of the program are restricted to legal states. Consider the 
problem of building a stabilizing data structure (e.g., a dictionary or a queue) 
within the memory of a single processor. The data structure provides a certain 
set of operations (for example to insert and delete elements). The correctness 
of the data structure can be defined in terms of allowable sequences of opera- 
tions. This notion of correctness is similar to the behavior specifications of I/O 
automata. Thus we can define a stabilizing data structure to be one in which 
the data structure can begin in an arbitrary state; however, any sequence of op- 
erations on the structure will eventually have a suffix that is a correct sequence 
(or a suffix of a correct sequence) of the data structure. For example, in a stabi- 
lizing dictionary, we would require that any elements inserted after the structure 
stabilizes would be found when searched for. 

Now any such data structure can be trivially made stabilizing in the following 
way. Before any operation on the data structure, we first check the invariants of 
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the data structure, and reinitialize the data structure if an error is found. Un- 
fortunately, this slows down regular operations. Thus if we charge for processing 
(which we have not done in the distributed model in this thesis), there appears 
to be a tradeoff between stabilization time and the time to complete normal 
operations. For some data structures, the tradeoff can be extremely good. For 
example, we can easily implement a stabilizing, bounded size queue using a fixed 
size array. The head and tail pointers can be restricted to stay within the array 
bounds. However, it appears to be much harder to obtain good tradeoffs for say 
a tree-based implementation of a dictionary. We have done some preliminary 
thinking in this area [MPV90]. 

• Applying Self-Stabilization to Other Areas: Most of the stabilizing pro- 
tocols described in this thesis are used for routing, scheduling, and resource 
allocation - tasks typically found in the network and Data Link layers of the 
communication hierarchy. However, in principle there is no reason why self- 
stabilization cannot be added as an "extra envelope" of fault-tolerance for higher 
layer protocols - e.g., file transfer and database protocols. Such protocols should 
be designed to avoid errors in the case of common faults such as node and link 
crashes. However, it is also desirable that these protocols recover by themselves 
after catastrophic errors. Self-stabilization is also applicable at the lowest layer 
of the protocol hierarchy. For example, it would be desirable to have stabilizing 
clock recovery and framing protocols at the physical layer. 

• Generalized Local Checking: The method of Katz and Perry [KP90] consists 
of a single leader that checks the entire network. In our method of local checking, 
nodes independently (and in parallel) check each link subsystem. The Katz and 
Perry method is more general but less efficient. Local checking of link subsystems 
is efficient but only applicable to locally checkable protocols. Thus there may 
be intermediate approaches in which the notion of locality is generalized to an 
arbitrary set of subnetworks of the original network. 



10.3 Summing Up 

Self-stabilization abstracts the ability of a protocol to tolerate catastrophic faults 
that stop. On the other hand, the cost of self-stabilization is often low. Thus self- 
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stabilization can provide a cheap way to improve the fault-tolerance of network pro- 
tocols. 

Local checking can be used to design efficient, stabilizing protocols. The resulting 
protocols can be proved correct in a systematic way. The overhead of local checking can 
be piggybacked on existing keep-alive traffic between network nodes. Local checking 
may prove to be a useful debugging tool because it provides a continuous check of 
system predicates, and violations can be logged. 

Our research into stabilization and local checking has helped us better understand 
existing protocols, both stabilizing and non-stabilizing. For instance, in the process of 
understanding Global Correction, we realized that a Global Reset protocol provides 
a mating relation that is a generalization of the guarantees of Data Link protocols. 
We also saw that many existing protocols use a special form of local checking that we 
called one-way checking. 

Whitehead once said that the interest of a generalization is the interest of a road for 
those who know what travel is; and the pleasure of the road has its roots in the labor 
of the journey. After struggling to understand many examples of stabilizing protocols, 
we have begun to appreciate the simplicity of understanding provided by such concepts 
as local checking and counter flushing. They have helped us to see things a little more 
clearly and to travel a little distance. We hope that our readers will go much further. 
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Appendix A 



Notation 



We summarize notation that is commonly used in this thesis. Other definitions can 
be found using the index. For example, if the notation for C UiV refers to a UDL, the 
definition of a UDL can be found by using the index. 

• {u,v}: For an pair of neighboring nodes u and v, this denotes the unordered 
pair corresponding to the directed edge (u,v). This is useful in defining a partial 
order on unordered pairs of nodes. 

• A\L: For any automaton A and any subset L of the states of A, A\L is the 
automaton that is identical to A except that the set of start states of A\L is L. 

• C u y. The Unit Capacity Data Link (UDL) corresponding to directed edge (u,v) 
in a topology graph. 

• Conj(C): For any link predicate set C, Conj(C) denotes the conjunction of the 
predicates in £. 

• Af(t): For any network automaton M , A/"(i) is the automaton that is identical to 
M except that the link and node delays in M{f) are equal to t. 

• Qu,v- The packet stored on the UDL corresponding to directed edge (u,v) in a 
topology graph. If the value of the variable is nil, then there is no packet stored. 

• queue u [v]: A queue of packets for outgoing edge (u,v) in the node automaton 
corresponding to node u. See the definition of a node automaton. 
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• xqueue u [v]: The concatenation of Q u ,v and queue u [v]. Only used in some proofs. 
Can be the read as the extended queue at node u corresponding to neighbor v, 

• U(A): For any automaton A, U(A) is the automaton that is identical to A except 
that the set of start states of U(A) is equal to the set of states of A. Thus U(A) 
creates a UIOA from an IOA. 

The following symbols are used frequently to denote the following quantities 

<: A partial order on a set of local predicates. 

a: An execution of an automaton 

ay. The j-th action in a behavior or execution. 

f3: A behavior of an automaton. 

/: A reset function. 

G: A graph or a topology graph. 

L: A predicate (set of states) of an automaton. 

L u>v A local predicate for edge (u,v) 

C: A set of local predicates. 

M: A network automaton. 

A/" -1- : An augmented network automaton. 

P: A packet alphabet. 

s: A state of an automaton. A state with a subscript like S{ often denotes the 
z-th state in an execution. 

t: A time. Times with subscripts like t n , t p denote special constants; many of 
these are listed in the index. 



279 



Appendix B 

Proofs for Chapter 5 



Let M + = Augment(M ', C, /). We will use s and s to denote states of A/" + . In deriving 
time bounds recall that all locally controlled actions at nodes take t n time; also for 
any link all actions have upper bound ti. 

B.l Any Execution of Af + is infinite 

Our first lemma states that all executions of A/" + are infinite and hence in all execu- 
tions of A/" + , time grows without bound. This follows because we have ruled out the 
possibility of so-called Zeno executions in our model (see Chapter 2). We will assume 
this implicitly in what follows without making explicit reference to this lemma. 

Lemma B.l.l Any execution a of M + is infinite. 

Proof: Suppose not. Then there is some final state s of a after which no action takes 
place. Consider any channel C UiV . If s.Q UiV = nil, then a FREE U] „ action is enabled; 
but s.Q UiV = p / nil, then a RECEIVE U] „(p) action is enabled. Both cases contradict 
the assumption that s is a final state. | 

Notice also that since M + is a UIOA, any suffix of an execution of M + that begins 
with a timed state is also an execution of N ' . We will assume this implicitly when we 
apply some of the lemmas and claims described below. 
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B.2 Basic Properties of links 

The first lemma states that once a link is drop-free (see Definition 5.6.7), it remains 
drop-free. 

Lemma B.2.1 For any (u,v) and any transition (s,tt,s) of Af + , if s £ F UiV , then 

° t J? U,V ' 

Proof: If s.freeq u [v] = true then since s £ F u>v , s.Q UiV = nil. Thus the only action 
that can cause s.Q UiV ^ nil is a SEND U] „(p) event that also sets s.freeq u [v] = false. 
If s.freeq u [v] = false then the the only action that can cause s.freeq u [v] = true is a 
FREE U] „ event which in turn can only occur if s.Q UiV = s.Q UiV = nil. | 

The next lemma says that (u,v) becomes drop-free after the first action that sends 
a packet to C UiV , or the first signal from C UiV that it is free. 

Lemma B.2. 2 For any edge (u,v) and any execution a, (u,v) is drop-free in all states 
of a that follow a FREE U] „ or a SEND U] „(p) action. 

Proof: After a FREE U] „ action, Q u ,v = nil, and after a SEND U] „(p) action, freeq u [v] = 
false. Hence if s follows either of these actions in a, then S{ £ F u>v . The lemma follows 
from the stability of F u>v (Lemma B.2.1). | 

Once a link is drop-free we can be sure that all packets sent on the link will be 
delivered. 



B.3 Time Bounds for Correction Phases 

Recall the definition of a correction phase on a link from 5. This section builds up to 
a few important time bounds that are useful in the main body of the proof. 

Only the last two lemmas are important. The remaining claims are only useful 
in proving the last two lemmas. In the following claims, when we say that an event 
occurs within time t in execution a, we mean that the event occurs within time t of 
the first state in a. 

The first claim says that a packet on a link will be delivered in ti time. 
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Claim B.3.1 For all executions a and any (u,v), if in the initial state, Q u ,v =p/ nil, 
then a R,ECEIVE U] „(p) event occurs within ti time units. 

The next claim says that a sender will know that a link is free in 2ti time. Also 
after this period, the link will be drop-free. (Intuitively it takes ti time units to deliver 
any packet on the link, and ti time units to deliver the FREE signal to the sender.) 

Claim B.3.2 For all executions a and any (u,v), within 2ti time units there is a state 
s such that s.freeq u [v] = true and (u,v) is drop-free in s. 

Proof: By Claim B.3.1 in ti time units after the first state of a, there must be a 
state 5, such that s.Q UiV = nil. Now we have two cases. First, suppose within ti 
time units after s a SEND U] „(p) event occurs. Then in the state preceding this event, 
freeq u [v] = true and Q u ,v = nil and we are done. On the other hand if the first case 
does not occur, then Q u ,v = niliov ti time units after s. Hence in ti time units after 
s a FREE U] „ action must occur, resulting in a state s such that s.freeq u [v] = true. By 
Claim B.2.2, (u,v) is drop-free in s. | 

The next claim bounds the time before a response will be sent. Note that in the 
code, responses are sent continuously even without waiting for a request. This helps 
avoid deadlock in arbitrary initial states. We rely on the matching process to weed 
out meaningless responses. 

Claim B.3.3 Response Time: For all executions a and any leader edge (u,v), there 
must be some SEND„ ]U (p resp ) action in a that occurs in 3t n + 4ti time units. 

Proof: From Claim B.3.2, within 2ti time, there is a state s such that s.freeq v [u] = true. 
If queue v [u] is empty for t n time units after s, then a SEND„ ]U (p resp ) will occur in this 
period. If not, queue v [u] becomes non-empty within time t n after s, which enables the 
sending of a data packet. In this case, within 2t n time units after s, a SEND„ iU (pdata) will 
occur, resulting in a state s in which turn^lu] = response. From Claim B.3.2, in 2 ti time 
units after s, there is a state s' such that s' .freeq v [u] = true and turn^lu] = response. 
Finally, in t n time units after s' ', a SEND„ ]U (p resp ) will occur. | 

The next claim bounds the time before a new phase will start on a link. 
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Figure B.l: Summary of the proof of Claim B.3.5. 

Claim B.3.4 For all executions a and any leader edge (u,v), within 2t n + 2ti time 
units there is a state in which phase u [v] = true. 

Proof: Assume phase u [v] = false in the initial state of a or we are done trivially. 
From Claim B.3.2, within 2ti time, there is a state s such that s.freeq u [v] = true. If 
queue u [v] is empty for t n time units after s, then a SEND U] „(p req ) action occurs within 
this period, resulting in a state in which phase u [v] = true. On the other hand, suppose 
that queue u [v] becomes non-empty within time t n after s. Then a SEND UiV (pd a ta) event 
will occur within 2t n time units after s, resulting in a state in which phase u [v] = true. 
I 

The next claim bounds how long it can take for a phase on a link to complete once 
it has started. 

Claim B.3.5 For all a and any leader edge (u,v), within 4t n + 10i; time units there 
is a state in which phase u [v] = false. 



Proof: The proof is summarized in Figure B.l. We assume the claim is false, so 
phase u [v] = true for 4t n + 10i; time units after the start of a. First, from Claim B.3.2 
in 2ti time, there is a state s such that both (u,v) and (v,u) are drop-free in s. Thus 
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any any requests and responses that are sent in states after s will be delivered to the 
corresponding receiver. 

From Claim B.3.2, within 2ti time after s, there is a state s' such that s'.freeq u [v] = 
true. Thus a a SEND U] „(p req ) action is enabled in s' . Thus within t n time units after 
s' a SEND U] „(p req ) action occurs. In ti time units after this SEND U] „(p req ) action, a 
RECEIVE U] „(p req ) action occurs resulting in a state say s" . Also in all states after s", 
count v [u] = count u [v]. Then by Claim B.3.3 in 3t n + 4ti after s" a SEND„ ]U (p resp ) 
event occurs with p resp . count = count u [v]. Finally, another ti time units after this 
event, a RECErVE„ iU (p 7 . eap ) event occurs with p resp . count = count u [v]. In the state 
immediately after this event, phase u [v] = false. This contradicts the assumption that 
phase u [v] = false for 4t n + 10i; time units after the start of a. | 

All the previous claims are only useful in proving Lemma 5.6.5 and Lemma 5.6.6. 
We now prove Lemma 5.6.6. 

Proof: First assume that l(u,v) = u. Now we consider two cases. If s .phase u [v] = 
false then by Lemma B.3.4, within 2t n + 2ti time after s , there is a state S{ in which 
S{.check u [v] = true. However, the action before S{ must be a SEND UiV (pd ata ) event. (The 
only other action that sets check u [v] = true is a SEND U] „(p req ) action, but such an action 
is not enabled if queue u [v] is non-empty and phase u [v] = true.) If s .phase u [v] = true, 
then by Lemma B.3.5 within 4t n + 10i; after s we reach a state in which phase u [v] = 
false and we are back to the previous case. Thus in both cases, a SEND UiV (pd a ta) occurs 
within t p after s . 

Now suppose that l(u,v) = v. Once again we have two cases. Suppose So.turnulv] = 
data. From Claim B.3.2, within 2ti time after s , there is a state S{ such that 
S{.freeq v [u] = true and SiAurnJ^v] = data. Thus within t n time after S{ a SEND UiV (pd a ta) 
event occurs. If SoAurnJ^v] ^ data, then by Lemma B.3.3, we see that in at most 
3t n + 4ti after s;, a SEND„ ]U (p resp ) event occurs that will result in state in which 
turnj^v] = data. Then we are back to the first case. In both cases, a SEND UiV (pd a ta) 
occurs within t p after s . | 

B.4 Proof that Clean Edges remain Clean 

The following claim is useful in determining how and when the counter values stored 
in the links and the receiver can change. Recall that we used undefined when there is 
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no counter value stored on a link. 

Claim B.4.1 In any execution, and for any leader edge (u,v), the only action that 
can add a new element that is not undefined to countset(u,v) is a SEND U] „(p req ) action. 

Proof: The only other actions that affect countset(u,v) are a RECEIVE U] „(p req ) , 
SEND„ ]U (p resp ) and ~R.ECEIVE ViU (p Tesp ) , none of which can add any new values other 
than undefined. | 

A nice property is that once an edge becomes clean, it remains clean. This is stated 
in Claim 5.6.11. We now prove Claim 5.6.11. 

Proof: We assume that leader edge (u,v) is clean in s and we show that it remains 
clean in s. We will refer to the five predicates in the definition of a clean link by their 
numbers. The stability of the first predicate follows directly from Lemma B.2.1. 

If s.phase u [v] = false and s.check u [v] = false, then no SEND U] „(p req ) action can occur 
which by Lemma B.4.1 is the only action that can change counts et(u,v). Thus the 
second predicate holds and the remaining three hold trivially. If s.phase u [v] = false and 
s.check u [v] = true, then n must be a SEND U] „(p req ) or a SEND UiV (pd a ta) event which will 
cause Predicate 3 to hold. Also by Predicate 2 in state s, s.count u [v] ^ s.respcount(u,v) 
and s.count u [v] ^ s.count v [u\. Since a SEND U] „(p req ) or a SEND UiV (pd ata ) event does not 
change count u [v] or respcount(u,v) or count v [u], Predicates 4 and 5 hold in s; also 
Predicate 2 holds trivially. 

If s.phase u [v] = true and s.check u [v] = false, then n must be a RECEIVE„ ]U (p resp ) 
action and s.count u [v] = respcount(u,v) and s.count u [v] = (s.count u [v] + 1) mod 4. 
Then we can use the fact that Predicates 3 and 4 hold in s to infer that Predicate 2 
holds in s. Also Predicates 3, 4, and 5 hold trivially in s. If s.phase u [v] = true and 
s.check u [v] = true, then the only actions of interest are SEND U] „(p req ) (which makes 
Predicate 3 true and leaves the others true), SEND„ ]U (p resp ) (which makes Predicate 4 
true and leaves the others true) and a RECEIVE U] „(p req ) . For a RECEIVE U] „(p req ) , we 
can use the fact that Predicate 3 holds in s to infer that Predicate 4 holds in s. All 
other Predicates hold trivially. Note that n cannot be a SEND UiV (pd a ta) event because 
s.phase u [v] = true. | 
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B.5 How Links Become Quiet: Detailed Proofs 

Recall the definition of a quietlink from Chapter 5. For (u,v) to be quiet, we want 
to show not only that L u>v holds by the end of the second (u,v) phase but also that 
no more "reset" actions can occur after this point so that L u>v will remain true. This 
motivates the following definitions of a reset transition. 

Definition B.5.1 Consider any transition (s,tt,s) of M + . We say that this transition 
is a reset transition at node u with respect to node x if: 

• s.mode u [x] = reset AND 

• Either n is a SEND UiX (p resp ) or n is a RECErVE a . iU (p 7 . eap ) and l(u,x) = x. 

Intuitively, a reset transition at node u causes the local reset function / to be 
applied to the state of u. 

We start by proving 5.6.17, which is the stability condition for a quiet link. Proof: We 
will show that each of the five predicates used to define a quiet link (Definition 5.6.15) 
holds in s. We will refer to the four predicates using the numbers given in Definition 
5.6.15. The first predicate holds in s because of Claim 5.6.11. 

Consider the second predicate. We wish to show that (s\u, s\(u,v), s\(y,u), s\v) (E 
L u ,v We know that the second predicate holds in s. We also know that if n is any 
action of A/", then the second predicate holds in s as well because L u>v is a closed 
predicate by definition. The only other transitions of A/" + that can affect L u>v are reset 
transitions at u or u. 

Suppose (s,7r,.s) is a reset transition at u. From the fact that the third predicate 
holds in s, it is easy to see that (s,tt,s) cannot be a reset transition at node u with 
respect to v. Suppose (s,tt,s) is a reset transition at node u with respect to some 
neighbor x ^ v. Let e denote the leader edge corresponding to edge (u,x). Then e 
is not quiet in state s. From the hypothesis, we infer that {u,x} -ft {u,v}. Thus by 
the stability condition for local correct ability, and since s\u = f(s\u,x), we infer that 
the second predicate holds in s even if (s,tt,s) is a reset transition at u. A similar 
argument works if (s,tt,s) is a reset transition at v. 
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We can infer that the third predicate holds in s from the fact that the third, fourth 
and fifth predicates hold in s. The only actions to consider are RECEIVE„ ]U (p resp ) and a 
RECEIVE U] „(p req ) . We can infer that the fourth predicate holds in s from the fact that 
the third and fourth predicates hold in s. The only action to consider is a SEND U] „(p req ) 
action. 

Consider the fifth predicate. Let X be the predicate Q u ,v = Presp and p resp . count = 
count u [v]. The only actions that can change the truth or falsity of predicate X are 
SEND„ ]U (p resp ) or RECEIVE„ ]U (p resp ) . Suppose that either n is a SEND„ ]U (p resp ) action 
and s.count u [v] ^ s.count v [u] OR n is a ~R.ECEIVE ViU (p Tesp ) action. Then X will become 
false in s and the fifth predicate holds trivially. On the other hand, suppose n is 
a SEND„ ]U (p resp ) action and s.count u [v] = s.count v [u] Thus X will become true in s. 
However, in this case, from the definition of a clean link, s.Q UiV (£ Pdata- Thus n will 
make the fifth predicate hold in s. 

Thus, suppose that X is true in s and also in s. The only other actions to consider 
are actions that change the state of s\u. We know that if n is any action of A/", then the 
fifth predicate holds in s as well because L u>v is a closed predicate by definition. The 
only other transition of A/" + that can affect the fifth predicate is a reset transition at u. 
Suppose (s,7r,.s) is a reset transition at node u with respect to some neighbor x ^ v. 
Let e denote the leader edge corresponding to edge (u,x). Then e is not quiet in state 
s. From the hypothesis, we infer that {u,x} -ft {u,v}. Thus by the stability condition 
for local correct ability, and since s\u = f(s\u, x), we infer that the fifth predicate holds 
in s even if (s,tt,s) is a reset transition at u. | 

Next we show the required liveness condition: that a leader edge (u,v) will become 
quiet in bounded time if all leader edges less than (u,v) are already quiet. 

In the rest of this section, we prove Lemma 5.6.18, which describes how and when 
links become quiet. A quick overview of this section can be obtained by skimming 
Claim B.5.4, Figure B.2, Definition B.5.7 and the last three lemmas in the section. 

We start by proving some more detailed properties of the local snapshot and reset 
protocols during a clean phase. For all the claims and lemmas in this section, we fix 
an execution a of M + and a leader edge (u,v). When we refer to a (u,v) phase we 
mean a (u,v) phase in a. When we refer to the code, we mean the code in Figures 5.6 
and 5.7. 
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Figure B.2: More detailed structure of a clean phase 

Since neither the mode or the counter at the leader can change except at the end 
of a phase it makes sense to talk of the mode and counter value of a phase. 

Definition B.5.2 Consider any clean (u,v) phase V with first state r. Then we use 
mode(V) to denote r.mode u [v] and count(V) to denote r.count u [v]. 

Claim B.5.3 For all states s in a clean (u,v) phase except the last state, s.count u [v] = 
count(V) and s.mode u [v] = mode(V). 

Proof: From the code, the values of count u [v] and mode u [v] can only change at the 
end of the phase. | 

Figure B.2 shows the structure of a clean phase in more detail for some leader 
edge (u,v). The next claim formalizes the intuition behind Figure B.2 and defines two 
important states within a phase: the midpoint and the penultimate state. 

Claim B.5.4 Any clean (u,v) phase V contains the following four actions in the order 
shown: 

• A SEND U] „(p req ) action with p req . count = count(V) and p req . mode = mode(V). 
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• 



• 



A RECEIVE U] „(p req ) action with p req . count = count(V) and p req . mode = mode(V). 

Exactly one SEND„ ]U (p resp ) action with p resp . count = count(V) and 
Presp -node state = m\v, where m is the state immediately following this action. 
We will call m the midpointphaselmidpoint of the phase. 

Exactly one RECEIVE„ ]U (p resp ) action with p resp . count = count(V) and p resp . node state 
m\v, where m is the midpoint of the phase. We will call the state immediately 
before this action, the penultimate statephaselpenultimate state of the phase. 

Proof: The four actions are shown in Figure B.2 as a, 6, c, and d respectively. The 
midpoint and penultimate states are also marked. 

Informally, the claim follows from two facts. First since both (u,v) and (v,u) are 
drop-free in all states of V any SEND U] „(p) action in V must be followed by a state 
in which Q u ,v = P- (i-e., any packet sent on channel C UiV will be stored in the link.). 
A similar statement holds for link (y,u). The second fact is that, by definition, at 
the start of a clean phase, count u [v] ^ counts et(u,v). Thus at the start of the phase, 
there are no potentially confusing requests or responses numbered with the value of 
the counter at the sender. This ensures, for instance, that when a response p re s P is 
received at u with p resp . count = count u [v], then p re s P was sent in the phase. Similarly 
it ensures that when a request p req is received with p req . count = count u [v], then the 
request was sent in the phase. 

The formal argument is quite tedious. We start by proving the last statement in 
the claim and then working backwards to prove the other three. Let's sketch the first 
part of the argument. 

We know that V can only end with a RECEIVE„ ]U (p resp ) with p resp . count = count u [v]. 
Thus in the state s' before this action, respcount(u,v) = count u [v]. But in the first 
state of "P, we know that since V is clean, respcount(u,v) ^ count u [v]. Thus there must 
have been a SEND„ ]U (p resp ) action in V . Also, there can be only one such action in V: 
any earlier SEND„ ]U (p resp ) action would have caused the phase to have ended earlier; 
no later SEND„ ]U (p resp ) action can occur because Q ViU ^ nil for all remaining states in 
the phase except the last state. If the state after the only SEND„ ]U (p resp ) action is ra, 
then from the code p resp . node state = m\v. Similar arguments can be used to show the 
remainder of the claim. | 
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In Figure B.2, in all states following b and before action c, mode v [u] = mode u [v]. 
Informally, this is because mode v [u] is set to the mode carried in the request packet and 
cannot change until action c occurs at which point mode v [u] is changed to snapshot. 
Then mode v [u] remains at this value till the end of the phase. Formally: 

Claim B.5.5 Consider a clean (u,v) phase V with midpoint m. Then in the state 
before m, mode v [u] = mode u [v] and in all remaining states in V, mode v [u] = snapshot. 

Proof: After the first RECEIVE U] „(p req ) action in "P, mode v [u] becomes equal to mode u [v]. 
The value of mode v [u] can only change due to RECEIVE U] „(p req ) actions and SEND„ ]U (p resp ) 
actions. However, future RECEIVE U] „(p req ) actions in V cannot change mode v [u] be- 
cause from the definition of a clean link, p req . count = count v [u\. After the first (and 
only, see Claim B.5.4) RECEIVE„ ]U (p resp ) action, mode v [u] = snapshot and remains at 
that value till the end of the phase. | 

The following definition is convenient: 

Definition B.5.6 The leader edge corresponding to an edge (u,v) is (u,v) ifl(u,v) = 
u and (v,u) if l(v,u) = v. 

In order to guarantee correction or checking at end of a (u,v) phase, we need 
restrictions on the states of links adjacent to either u oru. Otherwise, concurrent 
checking/correction on these adjacent links may invalidate the checking/correction 
done in the (u,v) phase. 

Definition B.5.7 We will call a (u,v) phase V well-behaved if V is clean and for 
all states s in the phase and for all {w,x} < {u,v}, the leader edge corresponding to 
{w,x} is quiet in s. 

Now in a well-behaved phase, the response contains the state of v at the midpoint. 
However, the response is received in the penultimate state. Despite the fact that the 
state of v recorded in the response is "old", the recorded state is stil useful in the 
following precise sense. 

Claim B.5.8 Phase Invariant: Consider any clean (u,v) phase V with midpoint 
m and penultimate state s' . Then for all states r in the interval [m,s'], the following 
predicates hold in r: 
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• f-Qv,u = Presp and p resp . count = count(V). 

• For any y: If (y,nil,nil,p reS p.nodestate) £ L u>v then (y,nil,nil,r\v) (E L u>v 

Proof: The first part of the claim follows directly from Claim B.5.4. We establish the 
second predicate by induction on the length of the interval [m, s']. First, this predicate 
is true in m because m.Q ViU = p re s P and p resp . node state = m\v by Claim B.5.4. 

Next, assume the second predicate holds in some state r' before r and consider the 
transition (r',7r,r). The only actions of interest are the actions that change the state 
of v. If 7r is any action of A/", then the second predicate holds in r as well because L u>v is 
a closed predicate. The only other transitions of A/" + that can affect the state of v are 
reset transitions at v. We first see that (r',7r,r) cannot be a reset transition at node 
v with respect to u. Suppose (r',7r,r) is a reset transition at node v with respect to 
some neighbor x ^ u. Let e denote the leader edge corresponding to {v,x}. Then e is 
not quiet in state r' . But since the phase is well-behaved, we infer that {v, x} ^t {u,v}. 
Thus by the stability condition for local correct ability, and since r\v = f(r'\v,x), we 
infer that the second predicate holds in r. | 

We can now state two lemmas that explain why the local snapshot and reset pro- 
cedures work correctly. Let us call a snapshot phase a well-behaved phase V such that 
mode(V) = snapshot. Similarly a reset phase is a well-behaved phase V such that 
mode(V) = reset. The first lemma states that if L u>v does not hold at the end of a 
snapshot (u,v) phase, then at the end of the phase mode u [v] = reset. This ensures 
that the next (u,v) phase will be a reset phase. 

Lemma B.5.9 For any snapshot (u,v) phase V with last state s the following is true. 
If (s\u,s\(u,v),s\(y,u),s\v) (jL L u>v , then s. mode u [v] = reset. 

Proof: Let s' be the penultimate state immediately before s in V . The action just 
before s is a RECEIVE„ ]U (p resp ) action. Since s' is the penultimate state of a snapshot 
phase, mode u [v] = snapshot and Q ViU = p re sp and p resp . count = count u [v] in s' . 

Thus, from the code, s\u = s'\u and s\v = s'\v (i.e., the basic states of nodes u and v 
remain unchanged after the RECEIVE„ ]U (p resp ) action.). Next, from the fact that (u,v) 
is clean in s' ', we deduce that s\(u,v) = s'\(u,v) = nil. Also, after a RECEIVE„ ]U (p resp ) 
action s\(v,u) = nil. 
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Thus if (s\u, s\(u,v), s\(v,u), s\v) (jL L u>v then (s'\u,nil,nil,s'\v) (jL L UiV . But by the 
phase invariant (Claim B. 5. 8) if (s'\u,nil,nil,s'\v) (jL L u>v then we can be sure that 
(s' \u, nil, nil, p resp . node state) (£ L UiV . Thus if (s\u, s\(u,v), s\(v,u), s\v) (jL L u>v then 
(s'\u,nil,nil,p resp .nodestate) (£ L UiV . Hence, from the code of the RECErVE„ iU (p 7 . eap ) 
action it is easy to see that s.mode u [v] = reset. | 

The second lemma states that at the end of a (u,v) reset phase, (u,v) is quiet. 

Lemma B.5.10 For any reset (u,v) phase V with last state s, (u,v) is quiet in s. 

Proof: Let s' be the state immediately before s in V . The action just before s must 
be a RECEIVE„ ]U (p resp ) action. Since s' is the penultimate state of a reset phase, 
mode u [v] = reset and Q ViU = p re sp and p resp . count = count u [v] in s' . Let m be the 
midpoint of V and m' be the state immediately before m in V . By Claim B.5.5, 
m' .mode v [u] = m' .mode u [v] = reset, and hence p resp . node state = f(m'\v,u). 

Now from the correction property of a local reset function, (f(s'\u,v), nil, nil, f(m'\v,u) G 
L UiV . Since p resp . node state = f{m'\v,u), (f(s'\u,v),nil,nil,p resp .nodestate) £ L UiV . 
But by the phase invariant, Claim B. 5. 8, this implies that (f(s'\u,v),nil,nil,s'\v) £ 
L UiV . But by similar arguments as in the proof of the previous lemma: s\(u,v) = nil, 
s\(y,u) = nil, and s\v = s'\v. However, since s'.mode u [v] = reset, s\u = f(s'\u). 
Together, these equations imply that (s\u, s\(u,v), s\(v,u), s\v) G L u>v . 

Next, s.mode u [v] = s.mode v [u] = snapshot by Claim B.5.3 and Claim B.5.5. Also, 
since (u,v) is clean in state s', if s.Q UiV = p req , then p req . count = s.count v [u\. Now 
it is easy to verify now that all five predicates used in the definition of a quiet link 
(Definition 5.6.15) hold in state s. | 

We are now ready to prove Lemma 5.6.18. The proof is almost immediate from 
the last two lemmas. We recall the statement of Lemma 5.6.18. If every leader edge 
(w,x) < (u,v) is quiet in some state S{ of some execution a of M + \C, then (u,v) is 
quiet in some state that occurs within 3 • t p of S{ in a. 

Proof: First from the hypothesis and Lemma 5.6.17, every (u,v) phase in a is well- 
behaved. Consider the first (u,v) phase V in a that starts after state S{. If mode(V) = 
reset, then we know from Lemma B.5.10 that at the end of V, (u,v) is quiet. If 
mode(V) = snapshot, then we know from Lemma B.5.9 that at the end of V, mode u [v] = 
reset. Consider the next (u,v) phase in a, say V'. Since mode u [v] only changes at the 
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end of a phase, mode(V') = reset. Thus by Lemma B.5.10, (u,v) is quiet at the end 
oiV. 

Hence there is some state Sj that occurs before the end of the second (u,v) phase 
that follows S{ in a and such that (u,v) is quiet in Sj. Thus by the phase rate lemma, 
Lemma 5.6.5, Sj must occur within 3t p time units after S{ in a. | 
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Appendix C 

The AAG reset protocol 



In this chapter, we describe why three phases were used in the original AAG protocol 
[AAG87] and also describe the changes that were required to convert the AAG protocol 
into the reset protocol described in Chapter 7. We also show that the mating relation 
provided by the AAG protocol is not transitive. 



C.l Why three phases are used in the AAG proto- 
col 

The AAG reset protocol [AAG87] is much more conservative than the Simple Reset 
Protocol about allowing a node to to return to Ready mode. 

The point of all the conservatism in the AAG protocol is as follows. The use of 
three phases ensures that if reset requests stop being made, all nodes will eventually 
return to Ready mode. The AAG protocol makes this guarantee even in dynamic 
networks. The additional rules the AAG protocol uses to work in dynamic networks 
are remarkably simple. Suppose a link from node u to node v fails. If node v is node 
m's parent (in the abort tree), node u takes over as the root of the abort tree; if node 
u is expecting an ack from node v, node u assumes it has got an ack from v. Finally, 
when a link from u to v comes up, nothing special is done! 

Here is an intuitive explanation of why three phases are used in the AAG protocol. 
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Each execution a of the reset protocol can be used to induce work intervals at 
nodes. Let us call a work interval at a node u, a maximal subsequence of a during 
which node u is not in Ready mode. Thus from the point of view of node u, each 
execution a can be divided into work intervals followed by Ready intervals (intervals 
during which u is in Ready mode.) Now each work interval at node u can be considered 
to be "caused" by an ABORT packet from a neighbor v (in which case we say that the 
work interval at u has as its parent a work interval at v) or by a reset request (in 
which case, we say that the work interval at u has no parent). Thus, starting from an 
execution a, we can assign each work interval at each node to a tree of work intervals. 

We claim that each tree of work intervals has a height of at most n, where n is 
the number of nodes in the graph. This follows if we can show that any work interval 
tree can contain at most one work interval from any given node. This brings us to 
the crucial observation. Consider any work interval tree T in a. Let I r be the work 
interval corresponding to the root. The use of three phases guarantees us that there 
is some state in I r that is contained in all the work intervals contained in T. This is 
the state in which r sends READY packets to all its children. But that implies that a 
given node u cannot have two distinct work intervals in T because these two disjoint 
work intervals do not share a common state. Hence the height of T is at most n. 

Thus, each root interval can "cause" at most one work interval at any node u. But 
each root interval corresponds to either a reset request (or to a link failure in the case 
of dynamic networks). Thus the number of work intervals at a node is at most the 
number of reset requests (plus the number of topology changes in the case of dynamic 
networks). Thus if the number of reset requests (and topology changes) is finite, there 
will only be a finite set of work intervals at each node. Next, it is possible to show 
that each work interval terminates in 0{n) time by using induction on the height of 
the abort tree. We combine the last two observations to prove the causality property 
- in finite time after all reset requests (and topology changes) stop, all nodes return 
to Ready mode. 

It is interesting to return to the simple reset protocol (SRP). Consider an execution 
a of SRP that begins in a "bad state" as shown in Figure 7.5. Suppose we construct 
work interval trees from execution a. The result is that we get a single work interval 
tree of infinite height. Each work interval ends in finite time but there are an infinite 
number of work intervals at each node! 
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C.2 Overview of the changes required for stabiliza- 
tion 

In a non-stabilizing setting (e.g. [AAG87], buffers like buffer u [v] are modelled by 
unbounded queues. However, just as in the case of links, stabilizing reset protocols 
must use bounded size queues if they are to stabilize in bounded time. Our solution 
is for v to use a single buffer to store messages from u, and to require that v sends a 
special S — Ack packet whenever it removes a message from the buffer. This special 
packet is not needed in the original A AG protocol that uses unbounded queues. 

The first step in making this protocol stabilizing is to make it locally checkable. A 
clear problem with the AAG protocol is that it will deadlock if in the initial state some 
parent edges form a cycle. As in stabilizing spanning tree algorithms [AKY90, AG90], 
we mend this flaw by maintaining a distance variable at each node, such that a node's 
distance is one greater than that of its parent. Specifically, distance is initialized to 
upon reset request, and its accumulated value is appended to the abort packets. Thus 
we encode an abort packet as a tuple (ABORT, d), where d is a distance. 

Next, we list all the local predicates that are necessary to ensure correct operation 
of the Reset protocol. Note that a significant advantage of our approach is that all we 
have to do is to prove that all local predicates eventually hold. Once we do that we can 
rely on the correctness of the original protocol. However, since a rigorous correctness 
argument for the original protocol did not exist (as far as we knew), we produced a 
correctness argument anyway. 

To ensure that the local predicates are closed predicates, we use the heuristic of 
removing unexpected packet transitions. Recall from Chapter 6, that to do this we 
add checks before processing any packet arriving at the node. We check whether this 
packet could have possibly been sent when the link subsystem (on which the packet 
arrived) is in a good state. Two checks we have added (which were not needed in the 
AAG protocol) are: when an ACK arrives, a node checks whether the ACK expected 
before processing it; when a READY packet arrives, a node accepts it only if it is in 
Converge mode and the packet has come from the node's parent. 

The use of the distance variable introduces a new problem. Since the distance field 
has a maximum value, say n' , we have to consider the case of a node u that receives an 
(ABORT, n') packet from neighbor v. If u is in Ready mode, u cannot simply accept v 
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as its parent and set its own distance to be one greater than n' since n' is the maximum 
value. Instead, in our code, u pretends that it has received a reset request before the 
(ABORT, n') packet. Thus u will first become a root and send ABORT packets to all 
its neighbors; then it will send back an ack to v. We call this a spurious reset request 
action. 

Since the new action we have added is just a combination of two existing actions 
in the original protocol, the new action preserves all local predicates and consistency 
conditions of the original protocol. However, it does slightly complicate the proof of 
termination (and hence of the causality condition). Clearly, if the reset protocol can 
keep producing such "spurious reset requests" the protocol may never terminate even 
if all real reset requests stop. Luckily, it is easy to show that within linear time after all 
local predicates of the original protocol hold, no (ABORT, n') packets can be received. 
Thus spurious reset requests stop in linear time, and the termination proof is only 
slightly more complicated. 

Next we have to design a local correction action for links, that is taken when a 
violation of the predicates is detected. The main difficulty about designing a correcting 
strategy is making it local, i.,e., to ensure that when we correct a link we do not 
affect the correctness of any other link. An interesting heuristic for this purpose is 
to notice that protocols designed for dynamic networks (like [AAG87]) had to deal 
with link failures and recovery. Now when a link fails and then immediately recovers, 
the original protocol must have established the local predicates for the link that failed 
without affecting the correctness of the other links. Thus the local correction procedure 
we use is essentially identical to the combined code in [AAG87] that is invoked when 
a link fails and recovers. This seems to be a powerful heuristic in general. 



C.3 Mating Relation is not Transitive 

Consider the reset protocol described in Chapter 7 (or the protocol described in 
[AAG87]) when all local predicates hold. We use the scenario shown in Figure C.l 
to show that the mating relation between signal intervals at neighboring nodes is not 
transitive. Thus the reset protocol may cause inconsistent states of the user protocol 
during initial signal intervals. However, the mating relation between final intervals 
is indeed transitive. The original paper [AAG87] showed an example in which the 
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Figure C.l: Counterexample to Show that Mating Relation is not Transitive 

mating relation was not transitive; however, the original paper used an optimization 
(in which ABORT packets were only sent on edges on which messages had either been 
sent or received in the last signal interval). Thus it was not clear whether the problem 
was due to the optimization. In fact, the simple counterexample in [AAG87] does not 
work as soon as we remove the optimization. 

However, we show in Figure C.l that there is a (more complicated) counterexample. 

In Figure C.l there are three nodes A, B and C that are connected in a cycle 
(not shown). The vertical axis represents time; time increases as we go downwards. 
Because this is a cycle, there is a link between A and C . When A sends a packet to 
C (or vice versa), we show this by showing an arrow leaving A and going off the left 
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of the page. We then depict the packet receipt at C by an arrow coming in to C from 
the right end of the page. Thus in the second event at C, C sends an ABORT packet 
to A which is received as the second event at A. 

In the initial state, mode(A) = Converge and parent A = B. Thus A is waiting for 
a READY message from B that is sent out at the start of the execution. Assume that 
after B has sends out the READY packet, B does a SIGNAL event immediately and so 
mode(B) = Ready at the start of the execution. We assume that mode(C) = Ready 
and that there is no other packet in transit in the initial state. Clearly this is a valid 
initial state. 

The first event at C is that C sends a user message (say ml) which is received 
by B and then delivered. Immediately after this B sends a user message ml to A. 
Shortly after this, a reset request occurs at both B and C . This causes B to send an 
ABORT packet to A and C, and C to send an ABORT packet to A and B. The ABORT 
packet sent from C to A arrives before the READY packet arrives at A. Thus A will 
send an ACK back. Similarly B sends an ACK immediately back to C . The net result 
is that by the time A receives its READY packet, C is already in READY mode. We 
also assume C has done a signal event immediately after going to READY mode. Next 
the message ml sent by B arrives at A and C sends a third user message ra3 which is 
received by A. All this happens before the ABORT packet in transit from B arrives at 
A. 

The net result is as follows. C sends two messages ml and ra3 in two different 
signal intervals at C, call them Sc and S' c . Message ml is received in a signal interval 
say Sb at B. Node B then sends a message ml in signal interval Sb that is received 
in a signal interval (say Sa at A.) Finally the message sent by C in S' c is also received 
in S A - 

By the definition of the mating relation, messages can only be received from a 
unique mate at a neighbor. Thus Sc mates to Sb and Sb mates to Sa- Also S' c mates 
to Sa- If the mating relation were transitive, Sc would mate to S' c which is impossible. 

The significance of this counterexample is that arguments like "making a reset 
request guarantees a fresh start of the application protocol" do not work. If the 
application is doing any general form of checking, it may detect an inconsistent state 
and keep making reset requests, leading to non-termination. For example, if we were 
to use the global snapshot protocol due to [KP90] to check the application and then 
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use our stabilizing reset, the protocol may never terminate! Our stabilizing reset is 
still useful in a number of cases. For instance, (Chapter 8) when used in conjunction 
with a local snapshot protocol that checks for closed local predicates. In any case, 
termination of the resulting protocol requires careful argument. 

Another significant thing about the counterexample is that it shows that our pair- 
wise definition of a mating relation between neighbors is probably the only statement 
that one can make about the non-final intervals of a reset protocol. The original speci- 
fication of [AAG87] used a state-based specification. It seems hard to specify this weak 
safety property of non-final intervals in terms of states as opposed to using an external 
behavior specification. In either, case [AAG87] only needed to specify the behavior 
in the final interval. For stabilizing applications, we must specify the behaviors in 
non-final intervals. 

It is interesting to note that the reset protocol of Arora and Gouda [AG90] (after 
adaptation to a message passing model) is likely to ensure a transitive mating relation. 
This is because it does a reset protocol on a tree and only the root (effectively) sends 
out the quivalent of ABORT packets. Of course, our stabilizing protocol can be first 
used to construct a spanning tree, after which we do another reset which is guaranteed 
to work from a single root outwards. 
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Appendix D 

Proofs for Reset Protocol in 
Chapter 7 



D.l Proving that the Local Predicates of the Reset 
Protocol are Closed 

Recall the definition of L u>v in Definition 7.6.3. 

Our strategy for showing that L u>v is closed is as follows. If we had to consider every 
action we would have a large number of cases to consider. However, each transition 
only affects a small number of variables. Thus we first isolate the transitions that can 
affect each variable. This reduces the number of cases we have to consider. 

First, we need to define the key transitions. Recall that the code in Figure 7.7 has 
certain code paths marked as VR, VA, DA, I A, FA, RA, and RR. We now define 
these more precisely. It is helpful to refer to the code of Figure 7.7 in understanding 
the following definitions. 

A informal description of these transitions is as follows. A VR (for Valid Request) 
transition is a reset request that causes a node to change its mode to Abort. A VA (for 
Valid Abort) transition is the receipt of an (ABORT, d) packet with d < n' that causes 
a node to change its mode to Abort. A DA (for Distance Invalid Abort) transition is 
the receipt of an (ABORT, n') packet that causes a node to change its mode to Abort. 
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An IA (for Invalid Abort) transition is the receipt of an (ABORT,*) packet that 
does not cause a node to change its mode to Abort. A FA (for Final Ack) transition 
is the receipt of an ACK packet that causes say node u to send an ACK packet to its 
parent. It is not hard to see that the ack that was received must have been the last 
ack that node u was waiting for. A RA (for Root Ack) transition is the receipt of an 
ACK packet at a root node that causes the root node to change its mode to Ready. A 
RR (for Regular Ready) transition is the receipt of an (READY) packet at a node that 
causes the node to change its mode to Ready. More carefully: 

Definition D.l.l We call a transition (s,a, s') of 7Z: 

• A VR transition at u if a = Request u and s.mode{u) = Ready. 

• A VA transition at u if a = RECEIVE* ]U ( ABORT, d) and d < n' and s.mode{u) = 
Ready. 

• A VA transition atu with respect to v if a = RECEIVE„ ]U ( ABORT, d) and s.mode{u) = 
Ready and d < n' . 

• A DA transition atu with respect to v if a = RECEIVE„ ]U ( ABORT, d) and s. mode (u) = 
Ready and d = n' . 

• An IA transition atu with respect to v if a = RECEIVE„ ]U ( ABORT, d) and s.mode{u) ^ 
Ready. 

• A FA transition at u with respect to v if a = RECEIVE* ]U (ACK) and s.mode{u) = 
Abort and s.parent u = v and s' .mode{u) = Converge. 

• A RA transition at u if a = RECEIVE* ]U (ACK) and s.mode{u) = Abort and 
s' .mode{u) = Ready. 

• A RR transition atu if a = RECEIVE* ]U (READY) and s.mode{u) ^ Ready and 
s' .mode{u) = Ready. 

In the following lemma we will say that a Boolean condition b is established by some 
transition (s,a, s') if b is false in s but true in s' . Recall that we wish to prove that 
each L u>v is closed. The next lemma makes this job easier, by isolating the transitions 
that can establish various Boolean conditions used in the definition of L u>v . 
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We start with the observation that we do not need to consider the transition DA 
explicitly because a DA transition at u with respect to v can be simulated by two other 
transitions: first a VR transition at u followed immediately by an IA transition at u 
with respect to v. Thus in the following lemmas and proofs we assume that the DA 
transition does not exist, 

Lemma D.1.2 

1. ack u [v] = true can only be established by a VR or a VA transition at u. 

2. Al(u,v) = true can only be established by a VR or a VA transition at u. 

3. A2(u,v) = true can only be established by a VA transition at v with respect to u. 

4- A3(u,v) = true can only be established by a IA or a FA transition at v with 
respect to u. 

5. A2(u,v) = false can only be established by a FA transition at v with respect to u. 

6. ack u [v] = false can only be established by a transition (s, a, s') such that s.ack u [v] = 
true and a = RECEIVE„ ]U (ACK). 

7. parent u = v can only be established by a VA transition at u with respect to v. 

8. mode{u) = Converge can only be established by a FA transition at u with respect 
to some neighbor x. 

9. mode{u) ^ Converge can only be established by a RR transition at u. 

10. Cl(u,v) = true can only be established by a IA or a FA transition at u with 
respect to v. 

11. C2(u,v) = true can only be established by a transition (s,a,s') such that s.ack v [u] = 
true and a = RECEIVE U] „(ACK). 

12. C3(u,v) = true can only be established by a RR or RA transition at v. 

13. C2(u,v) = false can only be established by a RR or RA transition at v. 
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Proof: By inspection of the code. | 

Notice that in the code we often enqueue packets to queue u [v]. But since queue u [v] 
is finite (it only has room for 5 packets), this allows the possibility of a transition 
(s,tt,s') causing a packet to be dropped if queue u [v] is full in the previous state. The 
next lemma shows that packets will not be dropped if L u>v holds in s. We will tacitly 
assume this lemma in what follows without making explicit reference to it. 

Lemma D.1.3 For any leader edge (u,v) and any transition (s,a, s') ofTZ, if s £ L u>v . 
then in s' , Q holds. Also if as part of the code for a, a packet p is enqueued on queue u [v] 
then p will be added to the tail of queue u [v] in s' . 

Proof: To show Q, we show that when a packet of a certain type is in xqueue u [v], 
then no action will enqueue a packet of the same type. The five types of packets to 
consider are (ABORT, *) packets, ACK packets, READY packets, S — ACK packets and 
S-messages. The second part also follows from this claim because xqueue u [v] has room 
for five packets. 

Suppose there is an (ABORT, *) packet in xqueue u [v] in s. Then (by .4.) s.mode{u) = 
Abort and hence this cannot be a VA or VR transitions at u. But these are the only 
transitions that can enqueue another (ABORT, *) packet to xqueue u [v] (see Lemma D.1.2, 
item 2). 

Suppose there is an ACK packet in xqueue u [v] in s. Then, (by B) Al(v,u) = false 
and A2(y,u) = false in s. Thus (by Lemma D.1.2, item 4), this cannot be a transition 
that can enqueue another ACK packet to xqueue u [v]. 

Suppose there is a READY packet in xqueue u [v] in s. Then (by Q) either s.mode{u) = 
Ready or there is an ABORT in xqueue u [v] after the READY packet. In the former case, 
(by Lemma D.1.2, item 12) this this cannot be a transition that can enqueue another 
READY packet to xqueue u [v]. In the latter case, (by .4.) s.ack u [v] = true, and (by B) 
A3(u,v) is false in s' . Thus again (by Lemma D.1.2, item 12) this this cannot be a 
transition that can enqueue another READY packet to xqueue u [v]. 

Suppose there is a S message in xqueue u [v] in s. Then (by 7i) in s, freem u [v] = false. 
Thus even if a = SENDM U] „(ra), the code will not enqueue m on xqueue u [v]. 

Suppose there is a S — ACK message in xqueue u [v] in s. Then (by 7i), buffer u [v] 
must be empty and so the code cannot queue a S — ACK on xqueue u [v]. | 
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We now proceed to prove that each of the predicates from A to "H are closed in a 
series of four lemmas: Lemma D.1.4, Lemma D.1.5, Lemma D.1.6, Lemma D.1.7, and 
Lemma D.1.8. 

Lemma D.1.4 For any leader edge (u,v) and any transition (s,a, s') ofTZ, if s £ L u>v . 
then in s' , A and B hold. 

Proof: We consider four cases: 

• Suppose acfc u [-y] = false in s but acfc u [-y] = true in s' . Then (by .4.) Al(u,v), 
A2(u,v), and A3(u,v) are false in s. Also by Lemma D.1.2, the transition must 
be a VA or a VR transition at u which causes Al(u,v) to become true and leaves 
A2(u,v) and A3(u,v) as false in s' . 

• Suppose acfc u [-y] = false in s and s' . Then (by .4.) Al(u,v), A2(u,v), and A3(u,v) 
are false in s. Also by Lemma D.1.2, items 1 and 2, Al(u,v) cannot hold in s' 
without making acfc u [-y] = true hold in s' . Also by Lemma D.1.2, item 3, A2(u,v) 
cannot become true in s' if Al(u,v) is false in s. Also by Lemma D.1.2, item 4, 
A3(u,v) cannot become true in s' if Al(u,v) and A2(u,v) are false in s. Thus 
Al(u,v), A2(u,v), and A3(u,v) are false in s' . 

• Suppose acfc u [-y] = true in s but acfc u [-y] = false in s' . By Lemma D.1.2, item 6, 
a = RECEIVE„ ]U (ACK) and so A3(u,v) is true in s. By B, Al(u,v) and A2(u,v) 
are false in s. Also, (by Q) there is exactly one ACK packet in s.xqueue v [u] and 
hence Al(u,v), A2(u,v), and A3(u,v) are false in s' . 

• Suppose acfc u [-y] = true in 5 and 5'. Then (by .4.) exactly one of Al(u,v),A2(u,v), 
or A3(u,v) is true in s. If Al(fi, , y) is true in 5, then Al(fi, , y) becomes false 
in 5' (by Lemma D.1.2, items 3 and 4) iff exactly one of A2(u,v) or A3(u,v) 
becomes true in s' . If A2(u,v) is true in 5, then A2(fi, , y) becomes false in 5' (by 
Lemma D.1.2, item 5) iff A3(u,v) becomes true in s' . Finally if A3(u,v) is true 
in 5, then A3(fi, , y) cannot become false in 5' (by Lemma D.1.2, item 6) without 
causing acfc u [-y] = false in 5', a contradiction. 

I 
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Lemma D.1.5 For any leader edge (u,v) and any transition (s,a, s') ofTZ, if s £ L u>v , 
then C and T> are true for (u,v) in s' . 

Proof: We consider cases: 

• Suppose parent u ^ v in s' . Then C and T> hold trivially in s' . Suppose parent u ^ v 
in s and parent u = v in s' . Then by Lemma D.1.2, item 7, this must be a VA 
transition at u with respect to v. Thus s' .mode{u) = Abort, Thus Al(v,u) is 
true in s and hence (by .4.) A3(y,u) = false in s and hence Cl(u,v) is false in 
s and hence in s' . Also, (by .4.) ac£„[/ii] = true in s and s' and hence C2(u,v) is 
false in s' . Next (by Q and Q) we can infer that C3(u,v) is false in 5 and hence 
in s' . (This is because if C3(u,v) were true in s then (by Q) there must be a 
second (ABORT,*) packet in xqueue v [u] which would violate Q.) 

Thus in the remaining cases we assume that parent u = v in s and s' . 

• Suppose mode{u) ^ Converge in s and s' . Then Cl(u,v), C2(u,v), and C3(u,v) 
are false in s. Also by Lemma D.1.2, item 10, Cl(u,v) can become true in 5' 
only if mode{u) = Converge in s' . x By Lemma D.1.2, item 11, C2(u,v) can 
become true in 5' only if Cl(u,v) is true in 5. Finally by Lemma D.1.2, item 
12, C3(u,v) can become true in s' only if Cl(fi, , y) or C2(u,v) is true in s. Thus 
Cl(u,v), C2(u,v), and C3(u,v) are /a/se in 5'. 

• Suppose mode{u) ^ Converge in s and mode{u) = Converge in 5'. Then (by C) 
Cl(u,v), C2(u,v), and C3(fi, , y) are false in 5. Also by Lemma D.1.2, item 8, 
a is a FA transition at u with respect to v. Thus after this transition, Cl(u,v) 
becomes true in 5', and C2(u,v) and C3(u,v) remain false in s' . 

• Suppose mode{u) = Converge in s and mode{u) ^ Converge in 5'. Then by 
Lemma D.1.2, item 9, a is a RR transition at u. Thus C3(u,v) is true in 5; 
hence (by £>) Cl(fi, , y) and C2(u,v) are /a/se in 5 and s' . Also, (by Q) there 
is exactly one READY packet in xqueue v [u] in s and this is removed after the 
transition, and so C3(u,v) is false in s' . 



^^Note that this cannot be an I A transition at u with respect to v because otherwise s.mode(u) 
Abort which together with s.parent u = v would violate A. 



306 



Suppose mode{u) = Converge in s and s' . Then (by C and £>) exactly one of 
Cl(u,v),C2(u,v), or C3(u,v) is true in s. First suppose Cl(u,v) is true in 5. 
Then (by .4.) ac£„[/ii] = true in s. Thus Cl(u,v) becomes false in s' iff exactly 
one of C2(u,v) or C3(u,v) becomes true in 5'. Also if Cl(u,v) remains true in 
5', then ac£„[/ii] = true in 5' by Lemma D.1.2, item 6. Thus by Lemma D.1.2, 
items 11 and 12, C2(u,v) and C3(u,v) cannot become true in s' . 

Next, notice that since we have assumed that mode{u) = Converge in s, Cl(u,v) 
remains false in s' (by Lemma D.1.2, item 10) if it is false in s. Suppose C2(u,v) 
is true in s. Then C2(u,v) becomes false in s' (by Lemma D.1.2, item 13) iff 
C3(u,v) becomes true in s' . 

Finally if C3(u,v) is true in s, then C3(fi,?;) cannot become false in 5' without 
causing mode{u) = Ready in s', a contradiction. Also by Lemma D.1.2, item 11 
C2(u,v) cannot become true in s' since Cl(u,v) is /a/se in s. 
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Lemma D.1.6 .For any leader edge (u,v) and any transition (s,a,s') ofTZ, if s £ L u>v , 
then 8 and T are true for (u,v) in s' . 

Proof: First consider T. Suppose there is no (ABORT,*) packet in xqueue u [v] in 
s. Then if an (ABORT,*) packet is in xqueue u [v] in 5', then this must be a (by 
Lemma D.1.2, item 2) VR or VA transition at u. However, after such a transi- 
tion there is an (ABORT, d) packet in xqueue u [v], where d = dist u + 1. Suppose 
there is a (ABORT, d) packet in xqueue u [v] in s with d = s.dist u + 1. Then (by .4.) 
s.mode{u) = Abort. Now the only transitions that can change dist u are VR or VA 
transitions at u, but such transitions are not enabled in if s.mode{u) ^ Ready. 

Now consider E and consider three cases. 

• parent u ^ v in s': then 8 holds trivially. 

• parent u ^ v in s but parent u = v in s': Then by Lemma D.1.2, item 7, a must 
be a R,ECEIVE„ ]U ( ABORT, d) event with s.mode{u) = Ready. But by T in 5, 
d = s.dist v + 1. Thus in 5', dist u = dist v + 1. and mode{u) ^ Ready. 
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• parent u = v in s and s' . If in s, C3(u,v) is true, then C3(u,v) can become 
/a/se in s' if a = RECEIVE„ ]U (READY), but in that case s' .parent u = nil. If in 
5, dist u = dist v + 1 and mode(v) ^ Ready, then this transition cannot be a VR 
or F^4 transition at v or w. But only such transitions can change either dist v or 
dist u . Also if s' .mode(v) = Ready, then this must be a RR or RA transition at v 
which results in C3(u,v) becoming true in s' . 
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Lemma D.1.7 For any leader edge (u,v) and any transition (s,a,s') ofTZ, if s £ L u>v , 
then Q is true for (u,v) in s' . 

Proof: Suppose in s there is no p in xqueue u [v] such that p is either a READY packet 
or j? is a S-message or p is a S — ACK. Then if there is such a packet in s' , then, by 
the code, s.mode(v) = Ready and we are done. Suppose in s there is a p in xqueue u [v] 
such that p is either a READY packet or p is a S-message or p is a S — ACK. Then by 
Q either there is an (ABORT,*) packet in xqueue u [v] or s.mode{u) = Ready. Consider 
the first case. Since the channel from u to v is FIFO, the (ABORT,*) packet cannot 
be removed from xqueue u [v] in s' without also removing packet p. Consider the second 
case. Then if s' .mode(v) ^ Ready then this transition must be a VA or VR transition 
at u which would result in adding an (ABORT, *) packet to the end of xqueue u [v] in s' . 
I 

Lemma D.1.8 For any leader edge (u,v) and any transition (s,a,s') ofTZ, if s £ L u>v , 
then "H is true for (u,v) in s' . 

Proof: We consider all the actions a that can affect this predicate. For each action 
considered, a symmetrical argument holds for the action with u and v interchanged. 

If a = SENDM U] „(ra) and s.freem u [v] = false, then message m is dropped and there 
is no change to the concerned variables. If a = SENDM U] „(ra) and s.freem u [v] = true, 
then there is no S-message in M UiV in s and no S — ACK in s.xqueue v [u\. Also, (by 
Qand 7i) there is at most one (ABORT,*), ACK, or READY packet in queue u [v] in s. 
Since queue u [v] can store five packets, after this event m is placed in queue u [v] and 
s' .freem u [v] = false and and there is no S — ACK in s.xqueue v [u\. 
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If a = R,ECEIVE U] „(ra), m £ S, then m is enqueued in buffer v [u] and none of 
the concerned variables change. Note that by 7i, buffer v [u] is empty in s. If a = 
R,ECEIVE U] „(ABORT, *) and s.buffer v [u] is empty in s, then there is no change to the 
concerned variables. Suppose a = R,ECEIVE U] „( ABORT, *) and s.buffer v [u] contains a 
message m in s or a = R,ECEIVEM U] „(ra). Then (by 7i) in s, freem u [v] = false, there 
is no S — ACK in xqueue v [u] , and no other message in M UiV besides m in buffer v [u]. 
Then in s' , all variables remain unchanged except that buffer v [u] becomes empty and 
a S — ACK is added to s.xqueue v [u]. 

Similarly, if a = R,ECEIVE„ ]U (E — Ack), then by 7iin s, freem u [v] = false, there is 
no S message in M UiV and exactly one S — ACK in xqueue v [u] . The result is that the 
S — ACK is removed from xqueue v [u] and freem u [v] becomes true in s' . | 



D.2 Proving that the Reset Protocol Behaviors are 
Timely, Causal, and Consistent 

We will show that every behavior of 1Z\L is timely and causal and satisfies the consis- 
tency property. We will do so in the next five subsections. First, we prove a series of 
useful preliminary lemmas. In the second subsection, we prove that every behavior is 
timely. In the third subsection, we prove that every behavior satisfies the consistency 
property. In the fourth subsection, we prove that every behavior is causal. Finally, we 
tie everything together and show that every behavior j3 of 1Z\L is in RP. 

We will assume this claim implicitly in what follows. 

D.2.1 Useful Claims and Lemmas for Reset Protocol 

The first lemma is the important Termination Lemma that states that the mode will 
become Ready in 0{n) time. 

Lemma D.2.1 Termination Lemma Consider an execution a = sq,o,\,s\,. . . of 
1Z\L. For any state S{ there is another state Sj that occurs within 0[n) time after S{ 
and such that Sj.mode(u) = Ready. 
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Proof: A formal argument can be made based on the intuitive "proof" given in 
Section 7.7.1. We omit it here. | 

The next lemma is the Signal Lemma. The lemma states that once the status of a 
node u is off (recall that this means that either signalbit u = true or mode u ^ Ready) 
then a SlGNAL u event is guaranteed to occur in linear time. 

Lemma D.2.2 Signal Lemma: For any execution a of 1Z\L, if Si.status(u) = off 
then a SlGNAL u event occurs within 0{n) time after S{. 

Proof: Suppose Si.mode{u) = Ready and S{.signalbit u = true. Then it is easy to see 
from the code that a SlGNAL u action is enabled and will remain enabled until it occurs 
in constant time after S{. 

The only other possibility is that Si.mode{u) ^ Ready. Let Sk be the first state 
after S{ such that Sk-mode{u) = Ready. We know from Lemma D.2.1 that such a 
state exists and Sk occurs within 0{n) time after S{. Then Sk-i.mode(u) ^ Ready and 
Sk.mode(u) = Ready. Thus from the code it is easy to see that Sk-signalbit = true. 
Now we are back to the first case and hence a SlGNAL u event must occur in constant 
time after Sk- I 

The next claim is a variation of the Signal Lemma. It states that the status of a 
node u cannot change from off to on until the next SlGNAL u event. 

Claim D.2.3 For any execution a of TZ\L, if S{.status(u) = off and Sk-status = on 
and k > i, then there is a SlGNAL u event between S{ and Sk in a. 

Proof: From the code, the only way status{u) can change from off to on is by a SlGNAL u 
event. Note that any action that changes mode{u) to Ready also sets signalbit u to true, 
which leaves status{u) unchanged. | 

Next we show that any packet queued on the outbound queue for a link will be 
delivered in constant time. This follows from the fact that the outbound queue has a 
size of at most 4. 

Claim D.2.4 Consider any pair of neighbors u,v and any execution a. If there is a 
packet p in xqueue u [v] in some state S{ of a, then in constant time after S{ there is a 
RECEIVE U] „(p) event, (i.e., any packet in either the outbound queue for a link or on 
the link itself is delivered within constant time.) 
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Proof: We know from Lemma 5.6.6 that the packet at the head of queue u [v] is placed in 
Qu,v (i-e., is placed in the channel) in t p time and that any packet in Q u ,v is delivered in 
ti time. The lemma follows since queue u [v] has a size of at most 4, and t p is a constant. 
I 

Next, we show that within constant time the signalbit variable at a node becomes 
false. 

Claim D.2.5 Consider any node u,v and any execution a. If signalbit u = true in 
some state S{ of a, then in constant time after S{ there is a state in which signalbit u = 
false. 

Proof: This follows because if signalbit u = true in s;, the SlGNAL u event is enabled 
and signalbit u will remain true unless the SlGNAL u event occurs (see code). Thus since 
each SlGNAL u action is in a separate class, a SlGNAL u action will occur in constant 
time after s;, resulting in a state (see code) in which signalbit u = false. | 

The next claim (which is used to show the consistency property) states the follow- 
ing. Suppose there is some interval in which the mode of a node u is Ready at the start 
and end of the interval but is not Ready somewhere within the interval. Consider any 
neighbor v of u. Then there must be some point within the interval during which an 
ABORT packet arrives at v; also at this point any messages in transit from u to v have 
been "flushed" out. 

Claim D.2.6 Consider any pair of neighbors u,v and any execution a of7Z\L. Con- 
sider any three states S{, Sj and Sk in a such that i < j < k and Si.mode{u) = Ready, 
Sj.mode{u) ^ Ready and Sk.mode(u) = Ready. Then there is some j' such that 
i < j' < k and ay is a RECEIVE U] „(ABORT, *) action and Sji.M UiV does not contain 
any Yi-message. 

Proof: Let si be the first state after S{ in which Si.mode{u) ^ Ready. Such a state 
must exist by hypothesis and it must be that / < j. Thus by the code, A3(u,v) is 
true in si (i.e., there is an abort packet in xqueue u [v] in si). Let Sj< be the first state 
after si in which A3(u,v) is false (i.e., the first state after si in which the abort packet 
is delivered). We know from Claim D.2.4 that such a state exists. Also, we know 
from A., that in the interval [sj,sj»], mode{u) ^ Ready. Thus j' < k. Also ay must 
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be a R,ECEIVE U] „(ABORT, *) event and following such an event it is easy to see from 
the code that mode(v) ^ Ready and that buffer v [u] is empty. Also we know that in 
the interval [sj,sj»], since mode{u) ^ Ready no S-message was added to M u>v . Also 
any S-message in xqueue u [v] in si must have been removed from xqueue u [v] before Sji 
because xqueue u [v] is a FIFO queue. Thus Sji.M UiV does not contain any S-message. 
I 

The next claim (which is also used to show the consistency property) is a mild 
corollary of the previous claim. Suppose there is some interval in which mode{u) = 
Ready at the start of the interval, and w's signal bit is false in the interval, and u does 
a signal at the end of the interval. Then the mode of u must be Ready at the end of 
the interval and must have been not Ready somewhere within the interval. Thus the 
previous claim applies, along with its consequences. 

Claim D.2.7 Consider any pair of neighbors u,v and any execution a of7Z\L. Con- 
sider any state S{ such that Si.mode{u) = Ready and another state S{i, i' > i, such that 
S{i .signalbit u = false. Suppose that after S{i there is a SlGNAL u action a*.. Then there 
is some j where i < j < k such that: 

• aj is a RECEIVE U] „(ABORT,*) action. 

• Sj.status(v) ^ on 

• Sj.M UiV does not contain any Yi-message. 

Proof: In the state just before a*., signalbit u = true but in s*., signalbit u = false. 
Let si be the first state before Sk-i in which signalbit u = false. Also I > i' because 
S{i .signalbit u = false. Thus from the code it must be that si.mode{u) ^ Ready and 
si + i. mode (u) = Ready. The lemma follows by using Claim D.2.6 to the three states s;, 
si and si + i and by observing that in any state that follows a R,ECEIVE U] „(ABORT, *) 
action, mode(v) ^ Ready and hence status(v) ^ on. | 

Notice that the requirements of the previous claim are satisfied if status{u) = on 
at the start of the interval and there is a signal at u at the end of the interval. We 
state this corollary to the previous claim as a separate claim as it is used often. 
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Claim D.2.8 Consider any pair of neighbors u,v and any execution a of7Z\L. Con- 
sider any state S{ such that S{.status(u) = on. Suppose that after S{ there is a SlGNAL u 
action a*.. Then there is some j where i < j < k such that: 

• aj is a RECEIVE U] „(ABORT,*) action. 

• Sj.status(v) ^ on 

• Sj.M UiV does not contain any Yi-message. 

Proof: Follows from Claim D.2.7. | 

The next three claims are all used to prove the timeliness property. 

Consider any two neighboring nodes u and v. The next claim states that if the 
status of both u and v is on for a sufficiently large constant, then u must deliver a free 
event (indicating that u is willing to accept a new message to be sent to v) within this 
time. 

Claim D.2.9 Consider any pair of neighbors u,v and any execution a of7Z\L. Con- 
sider any state S{. Then in constant time after S{ either a FREEM U] „ occurs or there is 
a state Sj such that either Sj.status{u) = off or Sj.status(v) = off. 

Proof: Suppose that status{u) = on and Sj.status(v) = on for c time after s;, where 
c is a large enough constant to make the following argument work. By 7i, in state S{ 
either: 

• freem u [v] is true. 

• xqueue v [u] contains a S — ACK. 

• M u>v contains a S-message. 

In the first case, assuming c is large enough, status{u) = on for a constant time 
after S{ which causes a FREEM U] „ to occur in constant time after S{. 

In the second case, if there is a S — ACK in xqueue v [u] then by Claim D.2.4 within 
constant time a R,ECEIVE„ ]U (E — Ack) event occurs which causes freem u [v] to become 
true, leaving us in the first case. 
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For the third case, we can use an argument similar to the second case to show that 
in constant time after s;, a R,ECEIVE U] „(ra) event occurs which causes m to be placed 
in buffer v [u]. Assuming again that c is large enough this implies that in constant time 
after s; either a RECEIVEM U] „(m) event or a RECEIVE U] „(ABORT) event occurs. Either 
of these events will cause a S — ACK to be placed in xqueue v [u], which brings us back 
to Case 2. | 

Consider any two neighboring nodes u and v. The next claim states that if the 
status of both u and v is on for a sufficiently large constant after a message is sent 
from u to v, then the message will be delivered to v within this time. 

Claim D.2.10 Consider any pair of neighbors u,v and any execution a of 7Z\L.. 
Consider any safe SENDM U] „(ra) action in a. Then in constant time after this action 
either a R,ECEIVEM U] „(ra) occurs or there is a state Sj such that either Sj.status{u) = 
off or Sj.status(v) = off. 

Proof: Similar to proof of Claim D.2.9. Let us denote the safe SENDM U] „(ra) event by 
aj. Suppose that status{u) = on and Sj.status(v) = on for c time after aj, where c is a 
large enough constant to make the following argument work. Then since aj is a safe 
send it is easy to see from the code that freem u [v] is true in the state before aj. Also 
by our assumption, status{u) = on in the state after aj. So m is placed in queue u [v] 
in the state after aj. Thus by Claim D.2.4, in constant time after aj, m is placed in 
buffer v [u] and if status(v) = on for constant time after this, a R,ECEIVEM U] „(ra) event 
occurs. | 

D.2.2 Every Behavior of 7Z\L is timely 

We prove that every behavior j3 of 1Z\L is timely by showing each of the four properties 
in Definition 7.3.3. Corresponding to the four properties, we have four lemmas. 

We will leave the first property of a timely behavior (i.e., that all messages received 
after 0{n) time are normal) to the end of this section. We start by showing the second 
property. 

Lemma D.2.11 Periodic Free Events: Consider any pair of neighbors u,v and 
any execution a of 7Z\L and any state Sj in a. Then either a FREEM U] „(ra) occurs 
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in constant time after Sj or a SlGNAL u action occurs in 0{n) time after Sj or or a 
SlGNAL„ action occurs in 0{n) time after Sj. 

Proof: We know from Claim D.2.9 that for some constant c, in c time after Sj a 
FREEM U] „(ra) occurs or there is a state Sk such that either Sk-status(u) = off or 
Sk-status(v) = off. In the first case, we are done. In the second case, we know from 
the Signal Lemma (Lemma D.2.2), that within 0{n) time after Sk either a SlGNAL u 
or a SlGNAL„ event occurs, | 

Next, we prove that 1Z\L satisfies the third property of a timely behavior. 

Lemma D.2.12 Timely Message Delivery: Consider any pair of neighbors u,v 
and any execution a of7Z\L. Consider any aj that is a safe SENDM U] „(ra) action in a. 
Then either a R,ECEIVEM U] „(ra) occurs in constant time after aj or a SlGNAL u action 
occurs in 0{n) time after aj or or a SlGNAL„ action occurs in 0{n) time after aj. 

Proof: We know from Claim D.2.10 that for some constant c, in c time after aj a 
R,ECEIVEM U] „(ra) occurs or there is a state Sj such that either Sj.status{u) = off or 
Sj.status(v) = off. In the first case, we are done. In the second case, we know from 
the Signal Lemma (Lemma D.2.2) that within 0{n) time after Sj either a SlGNAL u or 
a SlGNAL„ event occurs, | 

Next, we prove that 1Z\L satisfies the fourth property of a timely behavior. 

Lemma D.2.13 Signals at a Node induce Signals at Neighbors: Consider any 
pair of neighbors u,v and any execution a of7Z\L. There is some constant c such that 
for every SlGNAL u event aj that occurs at time greater than a. start + c • n there is a 
SlGNAL„ event that occurs in linear time before or after aj. 

Proof: First, within linear time of the start of j3, there must be some state Sh in which 
mode{u) = Ready (by the Termination Lemma). In constant time after Sh, there must 
be some state S{ in which signalbit u = false by Claim D.2.5. Consider any SlGNAL u 
action aj that occurs after state S{. In the state before aj, signalbit u = true but in state 
S{, signalbit u = false. Consider the first state Sk before Sj in which signalbit u = false. 
By Claim D.2.5, Sj occurs in constant time after Sk- Also since Sk+i-signalbit u = true, 
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the code tells us that that Sk-mode{u) ^ ready. But we know that Sh-mode{u) = 
Ready. Consider the first state si before Sk in which mode{u) = Ready. By the 
Termination Lemma, si occurs in linear time before Sk- 

Thus we have identified two states si and Sk, with k > I such that both states occur 
before the SlGNAL u event aj and such that simode{u) = Ready and Sk-signalbit u = false. 
By applying Claim D.2.7 to s/, Sk and aj we know that there is some state s m in the 
interval [sj,Sj] in which status(v) ^ on. Thus by the Signal Lemma, a SlGNAL„ event 
occurs within linear time after s m . But since si occurs within linear time before Sj, 
the SlGNAL„ event occurs in linear time before or after aj. | 

Finally, we prove that 7Z\L satisfies the first property of a timely behavior. This 
requires more work and so we start with two claims. The claims are quite intuitive. 

The first claim states that in any suffix of an execution, all except possibly the first 
packet received on a link are normal packets that have been sent in this execution. 

Claim D.2.14 Consider any pair of neighbors u,v and any execution a of7Z\L. Let 
S{ be the first state in a such that there is no Yi-message in S{.M UiV . Consider any a*. 
that is a R,ECEIVE U] „(ra) event that occurs after S{ in a. Then: 

• There is a SEND U] „(ra) action aj before a*, and such that there are no R,ECEIVE U] „(*) 
events in between aj and a*.. We will call the earliest such aj the send correspond- 
ing to ak in a. 

• In state Sj (i.e., the state immediately after aj in a), status{u) = on and in state 
Sk, status(v) = on 

• In all states in the interval [sj,Sk-\], m is in M u>v . 

Proof: It is clear from the code that in Sk-i, m must be in buffer u [v] and hence m is 
in M u>v . But since in s;, M u>v is empty, there must be a SEND U] „(ra) action between S{ 
and Sk-i which added m to M u>v . Let aj be the first such action that occurs before a*.. 
In the state immediately after aj by the code status{u) = on. Clearly, there cannot 
be a R,ECEIVE U] „(*) action between aj and a*, because (from 7i), M u>v contains at one 
most one message in any state. But a R,ECEIVE U] „(ra) action is the only action that 
can remove m from M u>v and hence m is in M u>v in the interval [.Sj, Sfc_i]. Also from 
the code, in the state immediately after a R,ECEIVE U] „(ra) action, status(v) = on. | 
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Claim D.2.15 Consider any pair of neighbors u,v and any execution a of 7Z\L and 
any state S{ in a. Then there is some state s m that occurs in 0{n) time after S{ such 
that there is no Yi-message in s m .M u 



L u,v • 



Proof: From Lemma D.2.1, there is a state Sj that occurs in 0{n) time after S{ such 
that Sj.mode(v) = Ready. From Claim D.2.5, in constant time after Sj there is a state 
Sji in which signalbit v = false. 

From Lemma D.2.1, there is a state Sk that occurs in 0{n) time after Sji such that 
Sk-mode{u) = Ready. From Claim D.2.5, in constant time after Sk there is a state Sk> 
in which signalbit u = false. 

From Lemma D.2.11, some event a/ occurs within 0{n) time after Sk>, where a/ is 
either a FREEM U] „ event or SlGNAL u or a SlGNAL„ event. In the first case, we are done 
by predicate "H which shows that FREEM U] „ cannot occur unless M UiV is empty. In the 
second case a/ is a SlGNAL u event, then by Claim D.2.7, there is some state sy in the 
interval [sk, s{\ such that there is no S-messagein Sk.M UiV . If a/ is a SlGNAL„ event, then 
by Claim D.2.7 there is a state say s^ in the interval [sj, si] such that si< .status{u) = off. 
Thus by the Signal Lemma (Lemma D.2.2), a SlGNAL u event occurs within 0{n) time 
after s^, which brings us back to the second case. | 

We can now show the first part of the consistency property (see Definition 7.3.5), 
that every message received after 0{n) time is normal; it is almost immediate from 
the last two claims. 

Lemma D.2.16 Normal Receipt of Messages: Consider any any execution a of 
7Z\L and any suffix 7 of a with first state s . There is some constant c such that every 
every receive event that occurs at time greater than s .time-\- c-n in a is normal. Also 
if aj is any normal receive event and a; is the send corresponding to aj, then aj occurs 
within 0[n) time after a;. 

Proof: Consider any pair of neighbors u,v. By Claim D.2.15, there is some Sj that 
occurs in 0{n) time after s and such that there is no S-message in Sj.M UiV . Let us call 
j the quiescent index for link (u,v). Further, let k be the largest quiescent index over 
all possible links (u,v). Clearly Sk occurs in 0{n) time after s . Also by Claim D.2.14, 
all receive events that occur after Sk in 7 are normal. 
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Also let dj be any normal receive event and a; be the send corresponding to aj. By 
Claim D.2.14, in all states in the interval [s;, Sj_i], m is in M u>v . But by Claim D.2.15, 
there is some state Sk that occurs in 0{n) time after S{ such that there is no S-message 
in Sk-M UiV . Thus aj occurs within 0{n) time after a;. | 

And now we come to the main result of this subsection: 

Lemma D.2.17 Every behavior of 7Z\L is timely. 

Proof: Immediate from Definition 7.3.3 and Lemmas D.2.16, D.2.13, D.2.11, and 
D.2.12. | 

D.2.3 Every Behavior of 7Z\L satisfies the consistency prop- 
erty 

We prove that every behavior j3 of 1Z\L satisfies the consistency property by showing 
each of the five properties in Definition 7.3.5. We have the following preliminary claim 
that is essential for proving the consistency property. 

The claim states the following. Consider some interval and some node u and 
suppose that the status of u is on at the start of the interval and the interval ends 
with u receiving a normal message m from v. Suppose also that there is a SlGNAL u 
event in the interval. Then from Claim D.2.8 we know that there is some point within 
the interval during which an ABORT packet arrives at v and such that any messages 
in transit from u to v have been "flushed" out. However, this claim goes further and 
states that the point at which the ABORT packet arrives at v occurs before v sends 
message m. 

Claim D.2.18 Consider any pair of neighbors u,v and any execution a of7Z\L. Con- 
sider any i,j,k such that i < j < k. Suppose S{.status(u) = on, aj is a SlGNAL u event, 
and ak is a normal receive event at u from v. Let ay be the send corresponding to a*.. 
Then there is some i < j' < k' such that status(v) ^ on and Sji.M UiV is empty. 

Proof: By Claim D.2.8 applied to S{ and aj we know there is some j' where i < j' < j 
such that: 
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• dji is a Receive U] „( Abort, *) action. 

• Sji.mode(v) ^ Ready 

• Sji.M UiV does not contain any E-message. 

As long as we can prove that j' < k' we are done. Suppose not. By A we know that 
Sji.ack u [v] = true. But since a*, is a receive action, Sk-status{u) = on, and so we know 
that Sk-ack u [v] = false. Let si be the first state after Sj< such that si.ack u [v] = false. 
Thus j' < I < k. But we know from the code that acfc u [-y] is only set to false after a 
RECEIVE„ ]U (ACK) event and so a\ must be a RECEIVE„ ]U (ACK) event. But by B we 
know that there is no ACK packet in M ViU in state Sji. Thus there must be some n, 
j' < n < k such that a n = Send„ ]U (Ack). Intuitively, what we have shown is that in 
the interval [sj»,sj.] an ack packet must have been sent by v and received by u. 

Now we can obtain the required contradiction. We will only sketch the rest of 
the argument informally. Suppose for contradiction, that j' > k'. Then it must be 
that the ack sent (in action a n ) was sent after the user message sent (in action a*./). 
But then (essentially because the channels and queues are FIFO), the corresponding 
user message receipt (i,e., action a*.) must have occurred before the corresponding ack 
packet receipt (call this action a{). Thus k < I. Also, since j' < k, we know that 
k lies in the interval [s^, s/_i]. But we know from A that in the interval [s^, s/_i], 
mode{u) ^ Ready since in this interval u is still waiting for an ack from v. But this 
contradicts the fact that in a state Sk immediately following a receive event such as 
a*., mode{u) must be Ready. | 

Next, we show a lemma (see Figure D.l) which states (in essence) that messages 
sent in a signal interval at v can be received in at most one signal interval at u; 
conversely messages received in a signal interval at u could have been sent in at most 
one signal interval at v. This will help establish that each signal interval at u can have 
at most one mated signal interval at v. 

Lemma D.2.19 Send Consistency: Consider any pair of neighbors u,v and any 
execution a of7Z\L. Let aj and a*, be any two normal receive events at u from v in a. 
Let ai and a m be the send events corresponding to aj and a*, respectively. Then there 
is a SlGNAL„ event between ai and a m iff there is a SlGNAL u event between aj and a*.. 
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Figure D.l: Send consistency: there is a signal between the two receives at u iff there is a signal between 
the two corresponding sends at v. 

Proof: Assume without loss of generality that j < k. Suppose there is a SlGNAL„ 
event, say a m < between a/ and a m . By Claim D.2.14, si.status(v) = on. Then by 
Claim D.2.8 there must be some p, I < p < m' , such that s p .status{u) = off and 
M ViU is empty. But by Claim D.2.14, M ViU is non-empty in the interval [sj,sj]. Thus 
since p > /, it must be that p > j. Also p < m' and m' < m and by Claim D.2.14, 
m < k. So p < k. Thus there is a state s p that occurs in the interval [sj,sj.] in 
which status{u) = off. Also we know by Claim D.2.14 that Sk-status{u) = on. Thus 
by Claim D.2.3, a SlGNAL u event must occur in the interval [sj, sj.]. 

The reverse argument is slightly different. Suppose there is a SlGNAL u event, say ay 
between aj and a*.. By the code, Sj.status(v) = on. Thus by Claim D.2.18, applied to 
Sj, ay and a*, we know that there is some j' such that j < j' < m and Sji.status(v) ^ on. 
But since / < j, Sj< occurs in the interval [s/,s m ] and status(v) = off. Also we know 
by Claim D.2.14 that s m .status(v) = on. Thus by Claim D.2.3, a SlGNAL u event must 
occur in the interval [sj,s m ]. I 

Next, we show a second lemma (see Figure D.2) which states (in essence) that a 
signal interval at u cannot send messages to and receive messages from different signal 
intervals at v. This will help show that the mating relation is symmetric 

Lemma D.2. 20 Send-Receive Consistency: Consider any pair of neighbors u,v 
and any execution a of7Z\L. Let aj be a normal receive event at u from v and let a m 
be a normal receive event at v from u. Let ai and a*, be the send events corresponding 
to aj and a m respectively. Then there is a SlGNAL„ event between ai and a m iff there 
is a SlGNAL u event between aj and a*.. 
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Figure D.2: Send-Receive consistency: there is a signal between the receive and the send at u iff there is 
a signal between the corresponding send and receive at v. 

Proof: Assume that k > j as shown in the Figure D.2. The other cases are similar. 

Suppose there is a SlGNAL u event, say ay between aj and a*.. By Claim D.2. 14, 
Sj.status{u) = on. Then by Claim D.2. 8 there must be some p, j < p < k', such 
that s p .status(v) = off. Thus since p > j, it must be that p > I. Also p < k and 
k < m. So p < m. Thus there is a state s p that occurs in the interval [s/,s m ] in 
which status(v) = off. Also we know by Claim D.2. 14 that s m .status(v) = on. Thus 
by Claim D.2. 3, a SlGNAL„ event must occur in the interval [sj,s m ]. 

The reverse argument is slightly different. Suppose there is a SlGNAL„ event, say 
a m i between a/ and a m . By the code, si.status(v) = on. Thus by Claim D.2. 18, applied 
to si, a m i and a m we know that there is some p such that p < k and s p .status{u) ^ on 
and such that s p .M ViU does not contain a S-message. We claim that j < p. If not not 
s p must lie in the interval [sj,Sj] and in this interval we know that there is always a 
S-message in M ViU by Claim D.2. 14. But this contradicts the fact that s p .M ViU does 
not contain a S-message. 

Thus s p occurs in the interval [sj,sj.] and status(u) ^ on. Also we know by 
Claim D.2. 14 that Sk-status{y) = on Thus by Claim D.2. 3, a SlGNAL u event must 
occur in the interval [sj, sj.]. 1 

We now show the third part of the consistency property (see Definition 7.3.5). 

Lemma D.2. 21 Successful Sending of Messages: Consider any pair of neighbors 
u,v and any execution a of 1Z\L. Between any safe SENDM„ ]U (ra) event and a later 
FREEM„ ]U event, there is either a R,ECEIVEM„ ]U (ra) event or a SlGNAL„ event. 
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Figure D.3: Mating of Final Signal Intervals: Messages sent in a final interval can only be received in 
another final interval. 

Proof: Let us denote the SENDM„ ]U (ra) event by a; and the FREEM„ ]U event by a*.. 
Clearly I < k. 

Thus, by the code status(v) = on in Sk- Also by the code, freem v [u] = true in Sk 
and hence by 7i, M ViU does not contain any S-message in Sk- 

Next, consider a;. Since a; is safe, by definition there must an action a^ = 
FREEM„ ]U (ra) such that h < i and such that there is no other SENDM„ ]U (*) action 
between a^ and a;. Thus, by the code freem v [u] = true in Sh and s;_i. Thus, by the 
code, we see that either S{.status(v) = off or m belongs to M ViU in S{ (i.e., the message 
m is placed on the queue at v to send to u). But in the first case, we are done by 
Claim D.2.3, which tells us there must be a SlGNAL„ event between S{ and Sk- 

So consider the second case where m belongs to M ViU in S{ and S{.status(v) = on. 
But we know that in the later state sj., m does not belong to M v>u . Now, from the 
code, the only two actions that could remove m from M ViU in the interval [sj,Sfc] are a 
RECEIVEM„ ]U (ra) event or a RECEIVE„ ]U ( ABORT, *) event. In the former case, we are 
done; so consider the latter case. Now, because Si.mode(v) = on, we know from A that 
there is no (ABORT,*) packet in S{.xqueue v [u]. Thus there must have been an action 
dj = SEND„ ]U (ABORT, *) in the interval [sj,sj.]. From the code, Sj.status(v) = off. The 
lemma now follows from Claim D.2.3, which tells us there must be a SlGNAL„ event 
between Sj and Sk- I 

We now show the fourth part of the consistency property (see Definition 7.3.5). 
The lemma on which it is based is sketched in Figure D.3. 
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Lemma D.2.22 Mating of Final Signal Intervals: Consider any pair of neighbors 
u,v and any execution a of 7Z\L. Let aj be a normal receive event at u from v in a 
and ai be the send corresponding to aj. Then there is no SlGNAL u event after aj iff 
there there is no SlGNAL„ event after a\. 

Proof: Suppose there is a SlGNAL„ event a n after a/. Then by Claim D.2.8, there is 
a state s m between a/ and a n such that s m .mode{u) ^ Ready and s m .M ViU does not 
contain any S messages. It follows from Claim D.2.14 that s m must occur after aj. 
Thus there is a state after aj in which mode{u) ^ Ready. Thus by the Signal Lemma 
(Lemma D.2.2), a SlGNAL u event will occur after aj. 

The reverse argument is similar but slightly simpler. | 

Lemma D.2.23 Mating Relation Preserves Temporal Ordering: Consider any 
pair of neighbors u,v and any execution a of7Z\L. Suppose a signal interval S u at u 
is mated to a signal interval S v at v and a signal interval S^ at u is mated to a signal 
interval S^ at v. Then if S^ occurs later than S u then S^ occurs later than S v . 

Proof: Omitted. Follows, in essence, from the FIFO properties of the underlying 
UDLs and the fact that ABORT packets sent between signal intervals flush the links 
and buffers of previously sent messages. | 

And now we come to the main result of this subsection: 

Lemma D.2.24 Every behavior of 7Z\L satisfies the consistency property. 

Proof: We define a signal interval at a node u in an execution a by analogy with 
the definitions for behaviors. We define two signal intervals I u at u and /„ at v to be 
mates if any normal message received in I u was sent in /„ or vice versa. Lemma D.2.19 
and Lemma D.2.20 show that the mating relation is well-defined and that any signal 
interval at u can have at most one mate and that the relation is symmetric. The last 
three conditions in Definition 7.3.5 follow from Lemma D.2.21, Lemma D.2.22 and 
Lemma D.2.23. I 
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D.2.4 Every Behavior of 7Z\L is causal 

We can now prove that the Reset protocol is causal. We first prove the first property 
in Definition 7.3.6. 

Theorem D.2.25 Consider any execution a = s , ai, Si, . . . of 7Z\L. There is some 
constant c such that every signal event a*, that occurs at time greater than s .time-\-cn 
is preceded by a request event aj such that a^.time — aj.time < en. 

Proof: A formal proof can be patterned after the intuitive argument given in Section 7.7. 
I 

Next, we prove the second property in Definition 7.3.6. 

Lemma D. 2. 26 Consider an execution a = s , ai, Si, . . . of 7Z\L. There is some 
constant c such that a SlGNAL u event occurs within en time of any Request u event. 

Proof: We know from the code that in the state immediately following a Request u 
event, status{u) = off. The lemma follows immediately from the Signal Lemma 
(Lemma D.2.2). | 

And now we come to the main result of this subsection: 
Lemma D.2.27 Every behavior of 7Z\L is causal. 
Proof: Immediate from Definition 7.3.6 and Lemmas D.2.25 and D.2.26. | 

D.2.5 The Main Theorem 

Finally, we can state our main theorem of this section, which follows from the main 
lemmas of the last three subsections: 

Theorem D.2.28 Every behavior of 1Z\L is in RP. 

Proof: Immediate from Definition 7.3.7 and Lemmas D.2.17, D.2.24, and D.2.27. | 
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Appendix E 

Dijkstra's Token Protocol as an 
Example of Counter Flushing 



In Chapter 10, we described a paradigm called counter flushing. We now show that Di- 
jkstra's first example protocol in [Dij74] can be simply understood using this paradigm. 

Dijkstra's first example is modelled by the automaton D2 shown in Figure E.l. 
As in the previous example, the nodes (once again numbered from to n — 1) are 
arranged such that node 1 has node and node 2 as its neighbors and so on. However, 
in this case we also assume that Process and n — 1 are neighbors. In other words, 
by making and n — 1 adjacent we have made the line into a ring. For process i, let 
us call Process i — 1 (we assume that all arithmetic on indices and counters is mod n) 
the anticlockwise neighbor of i and i + 1 the clockwise neighbor of i. 

Each node has a counter counti in the range 0, . . . n that is incremented mod n + 1. 
Once again the easiest way to understand this protocol is to understand what happens 
when it is properly initialized. Thus assume that initially Process has its counter set 
to 1 while all other processes have their counter set to 0. Processes other than are 
only allowed to "move" (see Figure E.l) when their counter differs in value from that 
of their anticlockwise neighbor; in this case, the process is allowed to make a move 
by setting its counter to equal that of its anticlockwise neighbor. Thus initially, only 
Process 1 can make a move after which Process 1 has its counter equal to 1; next, 
only Process 2 can move, after which Process 2 sets its counter equal to 1; and so on, 
until the value 1 moves clockwise around the ring until all processes have their counter 
equal to 1. 
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The state of the system consists of an integer variable 




counti G {0, . . 


.n}, one for every process in the ring. 




We assume that Process and n — 1 are neighbors 




In the initial state counti = for i = 1 . . .n — 1 and counti = 


1 


MovE (*action for Process only *) 




Precondition: 


counto = countn-i (*equal to anticlockwise nei 


ghbor?*) 


Effect: counto 


:= [county + 1) mod [n + 1) (increment count 


er*) 


Move;, 1 < i < r 


, — 1 (*action for other processes*) 




Precondition: 


counti 7^ counti-i (*not equal to anticlockwise 


neighbor?*) 


Effects: 






counti '■= counti-i;(*set equal to anticlockwise neighbor*) 




All actions are in 


a separate class 





Figure E.l: Automaton Dl: a version of Dijkstra's first example with initial states. The protocol does 
token passing on a ring using nodes with n states. 



326 



Process 




X-1 



X-1 



TOKEN 



Figure E.2: In the good states for Dijktra's first example, the ring can be partitioned into 2 bands with 
the token at the boundary. 

Process on the other hand cannot make a move until Process n — 1 has the 
same counter value as Process 0. Thus until Process 1 sets its counter to 1, Process 
cannot make a move. However, when this happens, Process 1 increments its counter 
mod n + 1. Then the cycle repeats as now the value 2 begins to move across the ring 
(assuming n > 2) and so on. Thus after proper initialization, this system does perform 
a form of token passing on a ring; each node is again considered to have the token, 
when the system is in a state in which the node can take a move. 

The good global states of the protocol can be sketched in Figure E.2. Notice that 
the ring can be partitioned into 2 bands. All counter values within a band are equal 
and the band that includes the top node has a counter value one higher than the lower 
band. The token is at the boundary between the two bands; after that node makes a 
move, the top band becomes larger and the lower band becomes smaller. 

It is easy to see that the system is in a good state iff the following local predicates 
are true. 

• For i = 1 . . . n — 1, either counti_i = counti or counti_i = counti + 1. 

• Either county = count n _i or county = count n _i + 1. 
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The system is locally checkable but it does not appear to be locally correctable. 
However, it does stabilize using a paradigm that we can call counter flushing. Even 
if the counter values are arbitrarily initialized (in the range 0, . . . ,ra) the system will 
eventually begin executing as some suffix of a properly initialized execution. We will 
prove this informally using three claims: 

• In any execution, Process will eventually increment its counter. Sup- 
pose not. Then since Process is the only process that can "produce" new 
counter values, the number of distinct counter values cannot increase. If there 
are two or more distinct counter values, then moves by Processes other than 
will reduce the number of distinct counter values to 1, after which Process will 
increment its counter. 

• In any execution, Process will eventually reach a "fresh" counter 
value that is not equal to the counter values of any other process. To 

see this, note that that in the initial state there are at most n distinct counter 
values. Thus there is some counter value say m that is not present in the initial 
state. Since, process keeps incrementing its counter, Process will eventually 
reach m and in the interim no other process can set their counter value to m. 

• Any state in which Process has a fresh counter value m is eventually 
followed by a state in which all processes have counter value m. It is 

easy to see that the value m moves clockwise around the ring "flushing" any 
other counter values, while Process remains at m. This is why we call this 
paradigm counter flushing. 

The net effect is that any execution of Dl eventually reaches a good state in which 
it remains. The reader should compare our proof of this protocol with the general 
description of counter flushing found in Chapter 10. 
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